IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v71y2022i1p194-218.html
   My bibliography  Save this article

Pólya‐gamma data augmentation and latent variable models for multivariate binomial data

Author

Listed:
  • John B. Holmes
  • Matthew R. Schofield
  • Richard J. Barker

Abstract

New Zealand police has long been suspected of systematic bias against the indigenous Māori. One resource available to investigate this possibility is the annual counts of police apprehensions and prosecutions, by offence type. However, model specification/fitting is complicated as these data are constrained counts, interdependent and multivariate. For example, there are limited options for factor models beyond continuous or binary data. This is a serious limitation for in our dataset, while measurements are clustered, different individuals are measured at each variable. Focusing on principal component/factor analysis representations, we show that under the canonical logit link, latent variable models can be fitted via Gibbs sampling, to multivariate binomial data of arbitrary trial size by applying Pólya‐gamma augmentation to the binomial likelihood. We demonstrate that this modelling approach, by incorporating shrinkage, will produce a fit with lower mean square error than techniques based on deviance minimization commonly employed for binary datasets. By exploring theoretical properties of the proposed models, we demonstrate a larger range of latent structures can be estimated and the presence of hidden replication improves prediction when data are multivariate binomial, which gives us greater flexibility for investigating associations between ethnicity and prosecution probability.

Suggested Citation

  • John B. Holmes & Matthew R. Schofield & Richard J. Barker, 2022. "Pólya‐gamma data augmentation and latent variable models for multivariate binomial data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(1), pages 194-218, January.
  • Handle: RePEc:bla:jorssc:v:71:y:2022:i:1:p:194-218
    DOI: 10.1111/rssc.12528
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssc.12528
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssc.12528?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Brent A. Coull & Alan Agresti, 2000. "Random Effects Modeling of Multiple Binomial Responses Using the Multivariate Binomial Logit-Normal Distribution," Biometrics, The International Biometric Society, vol. 56(1), pages 73-80, March.
    2. P. Richard Hahn & Carlos M. Carvalho & James G. Scott, 2012. "A sparse factor analytic probit model for congressional voting patterns," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 61(4), pages 619-635, August.
    3. Conti, Gabriella & Frühwirth-Schnatter, Sylvia & Heckman, James J. & Piatek, Rémi, 2014. "Bayesian exploratory factor analysis," Journal of Econometrics, Elsevier, vol. 183(1), pages 31-57.
    4. Asim Ansari & Kamel Jedidi, 2000. "Bayesian factor analysis for multilevel binary observations," Psychometrika, Springer;The Psychometric Society, vol. 65(4), pages 475-496, December.
    5. Caughey, Devin & Warshaw, Christopher, 2015. "Dynamic Estimation of Latent Opinion Using a Hierarchical Group-Level IRT Model," Political Analysis, Cambridge University Press, vol. 23(2), pages 197-211, April.
    6. Nick Patterson & Alkes L Price & David Reich, 2006. "Population Structure and Eigenanalysis," PLOS Genetics, Public Library of Science, vol. 2(12), pages 1-20, December.
    7. Nicholas G. Polson & James G. Scott & Jesse Windle, 2013. "Bayesian Inference for Logistic Models Using Pólya--Gamma Latent Variables," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(504), pages 1339-1349, December.
    8. repec:bfi:wpaper:2014-014 is not listed on IDEAS
    9. Michael E. Tipping & Christopher M. Bishop, 1999. "Probabilistic Principal Component Analysis," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 611-622.
    10. de Leeuw, Jan, 2006. "Principal component analysis of binary data by iterated singular value decomposition," Computational Statistics & Data Analysis, Elsevier, vol. 50(1), pages 21-39, January.
    11. Li Cai, 2010. "Metropolis-Hastings Robbins-Monro Algorithm for Confirmatory Item Factor Analysis," Journal of Educational and Behavioral Statistics, , vol. 35(3), pages 307-335, June.
    12. Anders Christoffersson, 1975. "Factor analysis of dichotomized variables," Psychometrika, Springer;The Psychometric Society, vol. 40(1), pages 5-32, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lorenzo Schiavon, 2025. "Addressing topic modelling via reduced latent space clustering," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 34(1), pages 1-20, March.
    2. Veronika Ročková & Edward I. George, 2016. "Fast Bayesian Factor Analysis via Automatic Rotations to Sparsity," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1608-1622, October.
    3. Aman Agrawal & Alec M Chiu & Minh Le & Eran Halperin & Sriram Sankararaman, 2020. "Scalable probabilistic PCA for large-scale genetic variation data," PLOS Genetics, Public Library of Science, vol. 16(5), pages 1-19, May.
    4. Ji Seung Yang & Li Cai, 2014. "Estimation of Contextual Effects Through Nonlinear Multilevel Latent Variable Modeling With a Metropolis–Hastings Robbins–Monro Algorithm," Journal of Educational and Behavioral Statistics, , vol. 39(6), pages 550-582, December.
    5. Landgraf, Andrew J. & Lee, Yoonkyung, 2020. "Dimensionality reduction for binary data through the projection of natural parameters," Journal of Multivariate Analysis, Elsevier, vol. 180(C).
    6. Buddhavarapu, Prasad & Bansal, Prateek & Prozzi, Jorge A., 2021. "A new spatial count data model with time-varying parameters," Transportation Research Part B: Methodological, Elsevier, vol. 150(C), pages 566-586.
    7. Niko Hauzenberger & Florian Huber, 2020. "Model instability in predictive exchange rate regressions," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(2), pages 168-186, March.
    8. Rub'en Loaiza-Maya & Didier Nibbering, 2022. "Fast variational Bayes methods for multinomial probit models," Papers 2202.12495, arXiv.org, revised Oct 2022.
    9. Gyaneshwer Chaubey & Anurag Kadian & Saroj Bala & Vadlamudi Raghavendra Rao, 2015. "Genetic Affinity of the Bhil, Kol and Gond Mentioned in Epic Ramayana," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-11, June.
    10. Anindya Bhadra & Arvind Rao & Veerabhadran Baladandayuthapani, 2018. "Inferring network structure in non†normal and mixed discrete†continuous genomic data," Biometrics, The International Biometric Society, vol. 74(1), pages 185-195, March.
    11. Haoying Wang & Guohui Wu, 2022. "Modeling discrete choices with large fine-scale spatial data: opportunities and challenges," Journal of Geographical Systems, Springer, vol. 24(3), pages 325-351, July.
    12. Xin Xu & Yang Lu & Yupeng Zhou & Zhiguo Fu & Yanjie Fu & Minghao Yin, 2021. "An Information-Explainable Random Walk Based Unsupervised Network Representation Learning Framework on Node Classification Tasks," Mathematics, MDPI, vol. 9(15), pages 1-14, July.
    13. Estavoyer, Maxime & François, Olivier, 2022. "Theoretical analysis of principal components in an umbrella model of intraspecific evolution," Theoretical Population Biology, Elsevier, vol. 148(C), pages 11-21.
    14. Luo, Nanyu & Ji, Feng & Han, Yuting & He, Jinbo & Zhang, Xiaoya, 2024. "Fitting item response theory models using deep learning computational frameworks," OSF Preprints tjxab, Center for Open Science.
    15. Matteo Barigozzi & Marc Hallin, 2023. "Dynamic Factor Models: a Genealogy," Papers 2310.17278, arXiv.org, revised Jan 2024.
    16. Chen, Andrew Y. & McCoy, Jack, 2024. "Missing values handling for machine learning portfolios," Journal of Financial Economics, Elsevier, vol. 155(C).
    17. Wang, Shao-Hsuan & Huang, Su-Yun, 2022. "Perturbation theory for cross data matrix-based PCA," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    18. Papastamoulis, Panagiotis, 2018. "Overfitting Bayesian mixtures of factor analyzers with an unknown number of components," Computational Statistics & Data Analysis, Elsevier, vol. 124(C), pages 220-234.
    19. Ambroise Wonkam & Kevin Esoh & Rachel M. Levine & Valentina Josiane Ngo Bitoungui & Khuthala Mnika & Nikitha Nimmagadda & Erin A. D. Dempsey & Siana Nkya & Raphael Z. Sangeda & Victoria Nembaware & Ja, 2025. "FLT1 and other candidate fetal haemoglobin modifying loci in sickle cell disease in African ancestries," Nature Communications, Nature, vol. 16(1), pages 1-21, December.
    20. Yang Yixin & Lü Xin & Ma Jian & Qiao Han, 2014. "A Robust Factor Analysis Model for Dichotomous Data," Journal of Systems Science and Information, De Gruyter, vol. 2(5), pages 437-450, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:71:y:2022:i:1:p:194-218. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.