IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v180y2020ics0047259x20302499.html
   My bibliography  Save this article

Dimensionality reduction for binary data through the projection of natural parameters

Author

Listed:
  • Landgraf, Andrew J.
  • Lee, Yoonkyung

Abstract

Principal component analysis (PCA) for binary data, known as logistic PCA, has become a popular alternative to dimensionality reduction of binary data. It is motivated as an extension of ordinary PCA by means of a matrix factorization, akin to the singular value decomposition, that maximizes the Bernoulli log-likelihood. We propose a new formulation of logistic PCA which extends Pearson’s formulation of a low dimensional data representation with minimum error to binary data. Our formulation does not require a matrix factorization, as previous methods do, but instead looks for projections of the natural parameters from the saturated model. Due to this difference, the number of parameters does not grow with the number of observations and the principal component scores on new data can be computed with simple matrix multiplication. We derive explicit solutions for data matrices of special structure and provide a computationally efficient algorithm for solving for the principal component loadings. Through simulation experiments and an analysis of medical diagnoses data, we compare our formulation of logistic PCA to the previous formulation as well as ordinary PCA to demonstrate its benefits.

Suggested Citation

  • Landgraf, Andrew J. & Lee, Yoonkyung, 2020. "Dimensionality reduction for binary data through the projection of natural parameters," Journal of Multivariate Analysis, Elsevier, vol. 180(C).
  • Handle: RePEc:eee:jmvana:v:180:y:2020:i:c:s0047259x20302499
    DOI: 10.1016/j.jmva.2020.104668
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047259X20302499
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jmva.2020.104668?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Hunter D.R. & Lange K., 2004. "A Tutorial on MM Algorithms," The American Statistician, American Statistical Association, vol. 58, pages 30-37, February.
    2. Johnstone, Iain M. & Lu, Arthur Yu, 2009. "On Consistency and Sparsity for Principal Components Analysis in High Dimensions," Journal of the American Statistical Association, American Statistical Association, vol. 104(486), pages 682-693.
    3. Michael E. Tipping & Christopher M. Bishop, 1999. "Probabilistic Principal Component Analysis," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 611-622.
    4. Li, Baibing & Martin, Elaine B. & Morris, A. Julian, 2002. "On principal component analysis in L1," Computational Statistics & Data Analysis, Elsevier, vol. 40(3), pages 471-474, September.
    5. de Leeuw, Jan, 2006. "Principal component analysis of binary data by iterated singular value decomposition," Computational Statistics & Data Analysis, Elsevier, vol. 50(1), pages 21-39, January.
    6. Carl Eckart & Gale Young, 1936. "The approximation of one matrix by another of lower rank," Psychometrika, Springer;The Psychometric Society, vol. 1(3), pages 211-218, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Scott W. Hegerty & Arkadiusz M. Kowalski & Małgorzata S. Lewandowska, 2023. "Complementarity of additionalities resulting from European Union funds: Perspective of the users of research infrastructures," Review of Policy Research, Policy Studies Organization, vol. 40(2), pages 307-331, March.
    2. María Elena Murrieta-Oquendo & Iván Manuel De la Vega, 2022. "State and Dynamics of the Innovative Performance of Medium and Large Firms in the Manufacturing Sector in Emerging Economies: The Cases of Peru and Ecuador," Sustainability, MDPI, vol. 15(1), pages 1-23, December.
    3. Jose Giovany Babativa-Márquez & José Luis Vicente-Villardón, 2021. "Logistic Biplot by Conjugate Gradient Algorithms and Iterated SVD," Mathematics, MDPI, vol. 9(16), pages 1-19, August.
    4. Andrea Wysocki & Jenna Libersky & Jonathan Gellar & Dean Miller & Su Liu & Margaret Luo & Alena Tourtellotte & Debra Lipson, "undated". "Medicaid Managed Long-Term Services and Supports: Summative Evaluation Report," Mathematica Policy Research Reports c11850aef1eb4101b78347191, Mathematica Policy Research.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Benaych-Georges, Florent & Nadakuditi, Raj Rao, 2012. "The singular values and vectors of low rank perturbations of large rectangular random matrices," Journal of Multivariate Analysis, Elsevier, vol. 111(C), pages 120-135.
    2. Dorota Toczydlowska & Gareth W. Peters & Man Chung Fung & Pavel V. Shevchenko, 2017. "Stochastic Period and Cohort Effect State-Space Mortality Models Incorporating Demographic Factors via Probabilistic Robust Principal Components," Risks, MDPI, vol. 5(3), pages 1-77, July.
    3. Chen, Tao & Martin, Elaine & Montague, Gary, 2009. "Robust probabilistic PCA with missing data and contribution analysis for outlier detection," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3706-3716, August.
    4. Wang, Shao-Hsuan & Huang, Su-Yun, 2022. "Perturbation theory for cross data matrix-based PCA," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    5. Marconi, Gabriele, 2014. "European higher education policies and the problem of estimating a complex model with a small cross-section," MPRA Paper 87600, University Library of Munich, Germany.
    6. Merola, Giovanni Maria & Chen, Gemai, 2019. "Projection sparse principal component analysis: An efficient least squares method," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 366-382.
    7. Matteo Barigozzi, 2023. "Asymptotic equivalence of Principal Components and Quasi Maximum Likelihood estimators in Large Approximate Factor Models," Papers 2307.09864, arXiv.org, revised Sep 2023.
    8. Johannes Burge & Priyank Jaini, 2017. "Accuracy Maximization Analysis for Sensory-Perceptual Tasks: Computational Improvements, Filter Robustness, and Coding Advantages for Scaled Additive Noise," PLOS Computational Biology, Public Library of Science, vol. 13(2), pages 1-32, February.
    9. Tuncer, Yalcin & Tanik, Murat M. & Allison, David B., 2008. "An overview of statistical decomposition techniques applied to complex systems," Computational Statistics & Data Analysis, Elsevier, vol. 52(5), pages 2292-2310, January.
    10. Dray, Stephane, 2008. "On the number of principal components: A test of dimensionality based on measurements of similarity between matrices," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 2228-2237, January.
    11. Shen, Dan & Shen, Haipeng & Marron, J.S., 2013. "Consistency of sparse PCA in High Dimension, Low Sample Size contexts," Journal of Multivariate Analysis, Elsevier, vol. 115(C), pages 317-333.
    12. Gaofeng Jia & Alexandros A. Taflanidis & Norberto C. Nadal-Caraballo & Jeffrey A. Melby & Andrew B. Kennedy & Jane M. Smith, 2016. "Surrogate modeling for peak or time-dependent storm surge prediction over an extended coastal region using an existing database of synthetic storms," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 81(2), pages 909-938, March.
    13. Yacine Aït-Sahalia & Dacheng Xiu, 2019. "Principal Component Analysis of High-Frequency Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(525), pages 287-303, January.
    14. Hirose, Kei & Fujisawa, Hironori & Sese, Jun, 2017. "Robust sparse Gaussian graphical modeling," Journal of Multivariate Analysis, Elsevier, vol. 161(C), pages 172-190.
    15. Hong, David & Balzano, Laura & Fessler, Jeffrey A., 2018. "Asymptotic performance of PCA for high-dimensional heteroscedastic data," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 435-452.
    16. Beramendi, Pablo & Stegmueller, Daniel, 2016. "The Political Geography Of The Eurocrisis," CAGE Online Working Paper Series 278, Competitive Advantage in the Global Economy (CAGE).
    17. de Leeuw, Jan & Lange, Kenneth, 2009. "Sharp quadratic majorization in one dimension," Computational Statistics & Data Analysis, Elsevier, vol. 53(7), pages 2471-2484, May.
    18. Bai, Jushan & Ng, Serena, 2019. "Rank regularized estimation of approximate factor models," Journal of Econometrics, Elsevier, vol. 212(1), pages 78-96.
    19. Rosember Guerra-Urzola & Katrijn Van Deun & Juan C. Vera & Klaas Sijtsma, 2021. "A Guide for Sparse PCA: Model Comparison and Applications," Psychometrika, Springer;The Psychometric Society, vol. 86(4), pages 893-919, December.
    20. Shen, Haipeng & Huang, Jianhua Z., 2008. "Sparse principal component analysis via regularized low rank matrix approximation," Journal of Multivariate Analysis, Elsevier, vol. 99(6), pages 1015-1034, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:180:y:2020:i:c:s0047259x20302499. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.