IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v133y2019icp1-19.html

A novel variational Bayesian method for variable selection in logistic regression models

Author

Listed:
  • Zhang, Chun-Xia
  • Xu, Shuang
  • Zhang, Jiang-She

Abstract

With high-dimensional data emerging in various domains, sparse logistic regression models have gained much interest of researchers. Variable selection plays a key role in both improving the prediction accuracy and enhancing the interpretability of built models. Bayesian variable selection approaches enjoy many advantages such as high selection accuracy, easily incorporating many kinds of prior knowledge and so on. Because Bayesian methods generally make inference from the posterior distribution with Markov Chain Monte Carlo (MCMC) techniques, however, they become intractable in high-dimensional situations due to the large searching space. To address this issue, a novel variational Bayesian method for variable selection in high-dimensional logistic regression models is presented. The proposed method is based on the indicator model in which each covariate is equipped with a binary latent variable indicating whether it is important. The Bernoulli-type prior is adopted for the latent indicator variable. As for the specification of the hyperparameter in the Bernoulli prior, we provide two schemes to determine its optimal value so that the novel model can achieve sparsity adaptively. To identify important variables and make predictions, one efficient variational Bayesian approach is employed to make inference from the posterior distribution. The experiments conducted with both synthetic and some publicly available data show that the new method outperforms or is very competitive with some other popular counterparts.

Suggested Citation

  • Zhang, Chun-Xia & Xu, Shuang & Zhang, Jiang-She, 2019. "A novel variational Bayesian method for variable selection in logistic regression models," Computational Statistics & Data Analysis, Elsevier, vol. 133(C), pages 1-19.
  • Handle: RePEc:eee:csdana:v:133:y:2019:i:c:p:1-19
    DOI: 10.1016/j.csda.2018.08.025
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947318302081
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2018.08.025?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. David M. Blei & Alp Kucukelbir & Jon D. McAuliffe, 2017. "Variational Inference: A Review for Statisticians," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 859-877, April.
    2. Park, Trevor & Casella, George, 2008. "The Bayesian Lasso," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 681-686, June.
    3. Veronika Ročková & Edward I. George, 2014. "EMVS: The EM Approach to Bayesian Variable Selection," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(506), pages 828-846, June.
    4. Nicholas G. Polson & James G. Scott & Jesse Windle, 2013. "Bayesian Inference for Logistic Models Using Pólya--Gamma Latent Variables," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(504), pages 1339-1349, December.
    5. Tian, Guo-Liang & Tang, Man-Lai & Fang, Hong-Bin & Tan, Ming, 2008. "Efficient methods for estimating constrained parameters with applications to regularized (lasso) logistic regression," Computational Statistics & Data Analysis, Elsevier, vol. 52(7), pages 3528-3542, March.
    6. David Rossell & Francisco J. Rubio, 2018. "Tractable Bayesian Variable Selection: Beyond Normality," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(524), pages 1742-1758, October.
    7. David Rossell & Donatello Telesca, 2017. "Nonlocal Priors for High-Dimensional Estimation," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(517), pages 254-265, January.
    8. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    9. Latouche, Pierre & Mattei, Pierre-Alexandre & Bouveyron, Charles & Chiquet, Julien, 2016. "Combining a relaxed EM algorithm with Occam’s razor for Bayesian variable selection in high-dimensional regression," Journal of Multivariate Analysis, Elsevier, vol. 146(C), pages 177-190.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lai, Wei-Ting & Chen, Ray-Bing & Chen, Ying & Koch, Thorsten, 2022. "Variational Bayesian inference for network autoregression models," Computational Statistics & Data Analysis, Elsevier, vol. 169(C).
    2. Yingjun Ma & Yuanyuan Ma, 2024. "Kernel Bayesian logistic tensor decomposition with automatic rank determination for predicting multiple types of miRNA-disease associations," PLOS Computational Biology, Public Library of Science, vol. 20(7), pages 1-23, July.
    3. Shan Feng & Wenxian Xie & Yufeng Nie, 2024. "Simultaneous Bayesian Clustering and Model Selection with Mixture of Robust Factor Analyzers," Mathematics, MDPI, vol. 12(7), pages 1-23, April.
    4. Li, Xiaowei & Tang, Junqing & Hu, Xiaojiao & Wang, Wei, 2020. "Assessing intercity multimodal choice behavior in a Touristy City: A factor analysis," Journal of Transport Geography, Elsevier, vol. 86(C).
    5. Juanjuan Zhang & Weixian Wang & Mingming Yang & Maozai Tian, 2025. "Variational Bayesian Variable Selection in Logistic Regression Based on Spike-and-Slab Lasso," Mathematics, MDPI, vol. 13(13), pages 1-18, July.
    6. Jianhua Zhao & Changchun Shang & Shulan Li & Ling Xin & Philip L. H. Yu, 2025. "Choosing the number of factors in factor analysis with incomplete data via a novel hierarchical Bayesian information criterion," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 19(1), pages 209-235, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Qi Zhang & Yihui Zhang & Yemao Xia, 2024. "Bayesian Feature Extraction for Two-Part Latent Variable Model with Polytomous Manifestations," Mathematics, MDPI, vol. 12(5), pages 1-23, March.
    2. Diego Vidaurre & Concha Bielza & Pedro Larrañaga, 2013. "A Survey of L1 Regression," International Statistical Review, International Statistical Institute, vol. 81(3), pages 361-387, December.
    3. Bernardi, Mauro & Costola, Michele, 2019. "High-dimensional sparse financial networks through a regularised regression model," SAFE Working Paper Series 244, Leibniz Institute for Financial Research SAFE.
    4. Dufays, Arnaud & Rombouts, Jeroen V.K., 2020. "Relevant parameter changes in structural break models," Journal of Econometrics, Elsevier, vol. 217(1), pages 46-78.
    5. Juanjuan Zhang & Weixian Wang & Mingming Yang & Maozai Tian, 2025. "Variational Bayesian Variable Selection in Logistic Regression Based on Spike-and-Slab Lasso," Mathematics, MDPI, vol. 13(13), pages 1-18, July.
    6. M. Marsman & K. Huth & L. J. Waldorp & I. Ntzoufras, 2022. "Objective Bayesian Edge Screening and Structure Selection for Ising Networks," Psychometrika, Springer;The Psychometric Society, vol. 87(1), pages 47-82, March.
    7. Posch, Konstantin & Arbeiter, Maximilian & Pilz, Juergen, 2020. "A novel Bayesian approach for variable selection in linear regression models," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    8. Uddin, Md Nazir & Gaskins, Jeremy T., 2023. "Shared Bayesian variable shrinkage in multinomial logistic regression," Computational Statistics & Data Analysis, Elsevier, vol. 177(C).
    9. Dimitris Korobilis & Kenichi Shimizu, 2022. "Bayesian Approaches to Shrinkage and Sparse Estimation," Foundations and Trends(R) in Econometrics, now publishers, vol. 11(4), pages 230-354, June.
    10. Nicholas G. Polson & James G. Scott, 2016. "Mixtures, envelopes and hierarchical duality," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(4), pages 701-727, September.
    11. McLain, Alexander C. & Zgodic, Anja & Bondell, Howard, 2025. "Efficient sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm," Computational Statistics & Data Analysis, Elsevier, vol. 207(C).
    12. Deborah Gefang & Gary Koop & Aubrey Poon, 2019. "Variational Bayesian Inference in Large Vector Autoregressions with Hierarchical Shrinkage," Discussion Papers in Economics 19/05, Division of Economics, School of Business, University of Leicester.
    13. Fan, Jianqing & Jiang, Bai & Sun, Qiang, 2022. "Bayesian factor-adjusted sparse regression," Journal of Econometrics, Elsevier, vol. 230(1), pages 3-19.
    14. R. Alhamzawi & K. Yu & D. F. Benoit, 2011. "Bayesian adaptive Lasso quantile regression," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 11/728, Ghent University, Faculty of Economics and Business Administration.
    15. Sweata Sen & Damitri Kundu & Kiranmoy Das, 2023. "Variable selection for categorical response: a comparative study," Computational Statistics, Springer, vol. 38(2), pages 809-826, June.
    16. Bai, Jushan & Ando, Tomohiro, 2013. "Multifactor asset pricing with a large number of observable risk factors and unobservable common and group-specific factors," MPRA Paper 52785, University Library of Munich, Germany, revised Dec 2013.
    17. Li, Jiahan & Chen, Weiye, 2014. "Forecasting macroeconomic time series: LASSO-based approaches and their forecast combinations with dynamic factor models," International Journal of Forecasting, Elsevier, vol. 30(4), pages 996-1015.
    18. Mike K. P. So & Wing Ki Liu & Amanda M. Y. Chu, 2018. "Bayesian Shrinkage Estimation Of Time-Varying Covariance Matrices In Financial Time Series," Advances in Decision Sciences, Asia University, Taiwan, vol. 22(1), pages 369-404, December.
    19. Oguzhan Cepni & I. Ethem Guney & Norman R. Swanson, 2020. "Forecasting and nowcasting emerging market GDP growth rates: The role of latent global economic policy uncertainty and macroeconomic data surprise factors," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(1), pages 18-36, January.
    20. Nicolas Städler & Peter Bühlmann & Sara Geer, 2010. "ℓ 1 -penalization for mixture regression models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 19(2), pages 209-256, August.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:133:y:2019:i:c:p:1-19. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.