IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v165y2022ics0167947321001596.html
   My bibliography  Save this article

Bayesian model selection for high-dimensional Ising models, with applications to educational data

Author

Listed:
  • Park, Jaewoo
  • Jin, Ick Hoon
  • Schweinberger, Michael

Abstract

Doubly-intractable posterior distributions arise in many applications of statistics concerned with discrete and dependent data, including physics, spatial statistics, machine learning, the social sciences, and other fields. A specific example is psychometrics, which has adapted high-dimensional Ising models from machine learning, with a view to studying the interactions among binary item responses in educational assessments. To estimate high-dimensional Ising models from educational assessment data, ℓ1-penalized nodewise logistic regressions have been used. Theoretical results in high-dimensional statistics show that ℓ1-penalized nodewise logistic regressions can recover the true interaction structure with high probability, provided that certain assumptions are satisfied. Those assumptions are hard to verify in practice and may be violated, and quantifying the uncertainty about the estimated interaction structure and parameter estimators is challenging. We propose a Bayesian approach that helps quantify the uncertainty about the interaction structure and parameters without requiring strong assumptions, and can be applied to Ising models with thousands of parameters. We demonstrate the advantages of the proposed Bayesian approach compared with ℓ1-penalized nodewise logistic regressions by simulation studies and applications to small and large educational data sets with up to 2,485 parameters. Among other things, the simulation studies suggest that the Bayesian approach is more robust against model misspecification due to omitted covariates than ℓ1-penalized nodewise logistic regressions.

Suggested Citation

  • Park, Jaewoo & Jin, Ick Hoon & Schweinberger, Michael, 2022. "Bayesian model selection for high-dimensional Ising models, with applications to educational data," Computational Statistics & Data Analysis, Elsevier, vol. 165(C).
  • Handle: RePEc:eee:csdana:v:165:y:2022:i:c:s0167947321001596
    DOI: 10.1016/j.csda.2021.107325
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947321001596
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2021.107325?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Sundberg,Rolf, 2019. "Statistical Modelling by Exponential Families," Cambridge Books, Cambridge University Press, number 9781108701112.
    2. Jiahua Chen & Zehua Chen, 2008. "Extended Bayesian information criteria for model selection with large model spaces," Biometrika, Biometrika Trust, vol. 95(3), pages 759-771.
    3. Carlos M. Carvalho & Nicholas G. Polson & James G. Scott, 2010. "The horseshoe estimator for sparse signals," Biometrika, Biometrika Trust, vol. 97(2), pages 465-480.
    4. Jones, Galin L. & Haran, Murali & Caffo, Brian S. & Neath, Ronald, 2006. "Fixed-Width Output Analysis for Markov Chain Monte Carlo," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1537-1547, December.
    5. Ick Hoon Jin & Minjeong Jeon, 2019. "A Doubly Latent Space Joint Model for Local Item and Person Dependence in the Analysis of Item Response Data," Psychometrika, Springer;The Psychometric Society, vol. 84(1), pages 236-260, March.
    6. Jaewoo Park & Murali Haran, 2018. "Bayesian Inference in the Presence of Intractable Normalizing Functions," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1372-1390, July.
    7. Eddelbuettel, Dirk & Francois, Romain, 2011. "Rcpp: Seamless R and C++ Integration," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i08).
    8. Park, Trevor & Casella, George, 2008. "The Bayesian Lasso," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 681-686, June.
    9. Michael Schweinberger & Mark S. Handcock, 2015. "Local dependence in random graph models: characterization, properties and statistical inference," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 77(3), pages 647-676, June.
    10. J. Møller & A. N. Pettitt & R. Reeves & K. K. Berthelsen, 2006. "An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants," Biometrika, Biometrika Trust, vol. 93(2), pages 451-458, June.
    11. Sundberg,Rolf, 2019. "Statistical Modelling by Exponential Families," Cambridge Books, Cambridge University Press, number 9781108476591.
    12. repec:dau:papers:123456789/6334 is not listed on IDEAS
    13. Minjeong Jeon & Ick Hoon Jin & Michael Schweinberger & Samuel Baugh, 2021. "Mapping Unobserved Item–Respondent Interactions: A Latent Space Item Response Model with Interaction Map," Psychometrika, Springer;The Psychometric Society, vol. 86(2), pages 378-403, June.
    14. Caimo, Alberto & Friel, Nial, 2014. "Bergm: Bayesian Exponential Random Graphs in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 61(i02).
    15. Caimo, Alberto & Gollini, Isabella, 2020. "A multilayer exponential random graph modelling approach for weighted networks," Computational Statistics & Data Analysis, Elsevier, vol. 142(C).
    16. repec:dau:papers:123456789/5724 is not listed on IDEAS
    17. Faming Liang & Ick Hoon Jin & Qifan Song & Jun S. Liu, 2016. "An Adaptive Exchange Algorithm for Sampling From Distributions With Intractable Normalizing Constants," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 377-393, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dimitris Korobilis & Kenichi Shimizu, 2022. "Bayesian Approaches to Shrinkage and Sparse Estimation," Foundations and Trends(R) in Econometrics, now publishers, vol. 11(4), pages 230-354, June.
    2. Qifan Song & Guang Cheng, 2020. "Bayesian Fusion Estimation via t Shrinkage," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 82(2), pages 353-385, August.
    3. Martin Feldkircher & Florian Huber & Gary Koop & Michael Pfarrhofer, 2022. "APPROXIMATE BAYESIAN INFERENCE AND FORECASTING IN HUGE‐DIMENSIONAL MULTICOUNTRY VARs," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 63(4), pages 1625-1658, November.
    4. Hauzenberger, Niko, 2021. "Flexible Mixture Priors for Large Time-varying Parameter Models," Econometrics and Statistics, Elsevier, vol. 20(C), pages 87-108.
    5. Chan, Joshua C.C., 2021. "Minnesota-type adaptive hierarchical priors for large Bayesian VARs," International Journal of Forecasting, Elsevier, vol. 37(3), pages 1212-1226.
    6. Domenico Giannone & Michele Lenza & Giorgio E. Primiceri, 2021. "Economic Predictions With Big Data: The Illusion of Sparsity," Econometrica, Econometric Society, vol. 89(5), pages 2409-2437, September.
    7. Anindya Bhadra, 2022. "Discussion to: Bayesian graphical models for modern biological applications by Y. Ni, V. Baladandayuthapani, M. Vannucci and F.C. Stingo," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 31(2), pages 235-239, June.
    8. Debamita Kundu & Riten Mitra & Jeremy T. Gaskins, 2021. "Bayesian variable selection for multioutcome models through shared shrinkage," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 48(1), pages 295-320, March.
    9. Hu, Guanyu, 2021. "Spatially varying sparsity in dynamic regression models," Econometrics and Statistics, Elsevier, vol. 17(C), pages 23-34.
    10. Lee Anthony & Caron Francois & Doucet Arnaud & Holmes Chris, 2012. "Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(2), pages 1-31, January.
    11. Robert B. Gramacy, 2020. "Discussion," International Statistical Review, International Statistical Institute, vol. 88(2), pages 326-329, August.
    12. James C. Russell & Ephraim M. Hanks & Murali Haran, 2016. "Dynamic Models of Animal Movement with Spatial Point Process Interactions," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 21(1), pages 22-40, March.
    13. Michael Pfarrhofer, 2020. "Forecasts with Bayesian vector autoregressions under real time conditions," Papers 2004.04984, arXiv.org.
    14. Anindya Bhadra & Jyotishka Datta & Nicholas G. Polson & Brandon T. Willard, 2020. "Global-Local Mixtures: A Unifying Framework," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 82(2), pages 426-447, August.
    15. Yu Bai & Andrea Carriero & Todd E. Clark & Massimiliano Marcellino, 2022. "Macroeconomic forecasting in a multi‐country context," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(6), pages 1230-1255, September.
    16. Chakraborty, Saptarshi & Bhattacharya, Suman K. & Khare, Kshitij, 2022. "Estimating accuracy of the MCMC variance estimator: Asymptotic normality for batch means estimators," Statistics & Probability Letters, Elsevier, vol. 183(C).
    17. Xueying Tang & Xiaofan Xu & Malay Ghosh & Prasenjit Ghosh, 2018. "Bayesian Variable Selection and Estimation Based on Global-Local Shrinkage Priors," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 80(2), pages 215-246, August.
    18. Mingan Yang & Min Wang & Guanghui Dong, 2020. "Bayesian variable selection for mixed effects model with shrinkage prior," Computational Statistics, Springer, vol. 35(1), pages 227-243, March.
    19. Gefang, Deborah & Koop, Gary & Poon, Aubrey, 2023. "Forecasting using variational Bayesian inference in large vector autoregressions with hierarchical shrinkage," International Journal of Forecasting, Elsevier, vol. 39(1), pages 346-363.
    20. Deborah Gefang & Gary Koop & Aubrey Poon, 2019. "Variational Bayesian Inference in Large Vector Autoregressions with Hierarchical Shrinkage," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2019-07, Economic Statistics Centre of Excellence (ESCoE).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:165:y:2022:i:c:s0167947321001596. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.