IDEAS home Printed from https://ideas.repec.org/p/osf/osfxxx/9twbk_v1.html

Regularized multigroup exploratory approximate factor analysis for easy analysis of complex data

Author

Listed:
  • Van Deun, Katrijn
  • Lê, Trà T.
  • Malinowski, Jakub
  • Mols, Floortje
  • Schoormans, Dounya

Abstract

Exploring multigroup data for similarities and differences in the measurement model is a substantial part of the research conducted in the behavioral and social sciences. Examples include studying the measurement invariance of psychological scales over age or ethnic groups and comparing symptom correlations between different psychological disorders. Multigroup exploratory factor analysis is often the method of choice. However, currently available methods are restrictive in their use. First, these methods cannot handle complex data with small sample sizes relative to the number of variables, while high-dimension, low-sample-size data are increasingly used as a result of digitalization (e.g., word counts obtained by text mining of online messages or omics data). Second, the use of existing software is often arduous. Here, we propose a regularized exploratory approximate factor analysis method that addresses these issues by building on a strong computational framework: The resulting method yields solutions that are constrained to show simple structure and similarity of the loadings over groups when supported by the data. The minimal input required is restricted to the data and number of factors. In a simulation study, we show that the method considerably outperforms existing methods, also in the low-dimensional setting; publicly available genomics data on different psychopathologies are used to illustrate that the method works in the ultrahigh-dimensional setting. Implementation of the method in the R software language for statistical computing is publicly available on GitHub, including the code used to conduct the simulation study and to perform the analyses of the three empirical data sets.

Suggested Citation

  • Van Deun, Katrijn & Lê, Trà T. & Malinowski, Jakub & Mols, Floortje & Schoormans, Dounya, 2025. "Regularized multigroup exploratory approximate factor analysis for easy analysis of complex data," OSF Preprints 9twbk_v1, Center for Open Science.
  • Handle: RePEc:osf:osfxxx:9twbk_v1
    DOI: 10.31219/osf.io/9twbk_v1
    as

    Download full text from publisher

    File URL: https://osf.io/download/67c99db54558244421fd75ff/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/9twbk_v1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Norman Cliff, 1966. "Orthogonal rotation to congruence," Psychometrika, Springer;The Psychometric Society, vol. 31(1), pages 33-42, March.
    2. Guerra Urzola, Rosember & Van Deun, Katrijn & Vera, J. C. & Sijtsma, K., 2021. "A guide for sparse PCA : Model comparison and applications," Other publications TiSEM 4d35b931-7f49-444b-b92f-a, Tilburg University, School of Economics and Management.
    3. Rosember Guerra-Urzola & Katrijn Van Deun & Juan C. Vera & Klaas Sijtsma, 2021. "A Guide for Sparse PCA: Model Comparison and Applications," Psychometrika, Springer;The Psychometric Society, vol. 86(4), pages 893-919, December.
    4. Chamberlain, Gary & Rothschild, Michael, 1983. "Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets," Econometrica, Econometric Society, vol. 51(5), pages 1281-1304, September.
    5. la Grange, Anthony & le Roux, Niël & Gardner-Lubbe, Sugnet, 2009. "BiplotGUI: Interactive Biplots in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 30(i12).
    6. Bai, Jushan & Ng, Serena, 2023. "Approximate factor models with weaker loadings," Journal of Econometrics, Elsevier, vol. 235(2), pages 1893-1916.
    7. Kohei Adachi & Nickolay T. Trendafilov, 2016. "Sparse principal component analysis subject to prespecified cardinality of loadings," Computational Statistics, Springer, vol. 31(4), pages 1403-1427, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rosember Guerra-Urzola & Niek C. Schipper & Anya Tonne & Klaas Sijtsma & Juan C. Vera & Katrijn Deun, 2023. "Sparsifying the least-squares approach to PCA: comparison of lasso and cardinality constraint," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 269-286, March.
    2. Matteo Barigozzi & Marc Hallin, 2024. "The Dynamic, the Static, and the Weak: Factor models and the analysis of high-dimensional time series," Papers 2407.10653, arXiv.org, revised May 2025.
    3. Diego Fresoli & Pilar Poncela & Esther Ruiz, 2024. "Dealing with idiosyncratic cross-correlation when constructing confidence regions for PC factors," Papers 2407.06883, arXiv.org.
    4. Michael Greenacre & Patrick J. F Groenen & Trevor Hastie & Alfonso Iodice d’Enza & Angelos Markos & Elena Tuzhilina, 2023. "Principal component analysis," Economics Working Papers 1856, Department of Economics and Business, Universitat Pompeu Fabra.
    5. He, Yong & Li, Lingxiao & Liu, Dong & Zhou, Wen-Xin, 2025. "Huber Principal Component Analysis for large-dimensional factor models," Journal of Econometrics, Elsevier, vol. 249(PB).
    6. Alex Shkolnik & Alec Kercheval & Hubeyb Gurdogan & Lisa R. Goldberg & Haim Bar, 2025. "Portfolio selection revisited," Annals of Operations Research, Springer, vol. 346(1), pages 137-155, March.
    7. Jianqing Fan & Yuling Yan & Yuheng Zheng, 2024. "When can weak latent factors be statistically inferred?," Papers 2407.03616, arXiv.org, revised Sep 2024.
    8. Zhongyuan Lyu & Ming Yuan, 2025. "Large-dimensional Factor Analysis with Weighted PCA," Papers 2508.15675, arXiv.org.
    9. Chen, Fangyi & Chen, Yunxiao & Ying, Zhiliang & Zhou, Kangjie, 2025. "Dynamic factor analysis of high-dimensional recurrent events," LSE Research Online Documents on Economics 127778, London School of Economics and Political Science, LSE Library.
    10. Ruiping Liu & Ndeye Niang & Gilbert Saporta & Huiwen Wang, 2023. "Sparse correspondence analysis for large contingency tables," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(4), pages 1037-1056, December.
    11. Wissem Benaissa & Fatiha Saidi & Khadidja Rahmoun, 2025. "Utilizing data mining techniques for the design of structural and mechanical properties of ABX3 perovskites," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 98(7), pages 1-20, July.
    12. Mario Forni & Luca Gambetti & Luca Sala, 2014. "No News in Business Cycles," Economic Journal, Royal Economic Society, vol. 124(581), pages 1168-1191, December.
    13. Tae-Hwy Lee & Ekaterina Seregina, 2024. "Optimal Portfolio Using Factor Graphical Lasso," Journal of Financial Econometrics, Oxford University Press, vol. 22(3), pages 670-695.
    14. Tom Boot & Bart Keijsers, 2025. "Diffusion index forecasts under weaker loadings: PCA, ridge regression, and random projections," Papers 2506.09575, arXiv.org.
    15. Sven Otto & Nazarii Salish, 2022. "Approximate Factor Models for Functional Time Series," Papers 2201.02532, arXiv.org, revised Feb 2025.
    16. Poncela, Pilar & Ruiz Ortega, Esther, 2012. "More is not always better : back to the Kalman filter in dynamic factor models," DES - Working Papers. Statistics and Econometrics. WS ws122317, Universidad Carlos III de Madrid. Departamento de Estadística.
    17. Tomohiro Ando & Ruey S. Tsay, 2009. "Model selection for generalized linear models with factor‐augmented predictors," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 25(3), pages 207-235, May.
    18. Cavit Pakel & Neil Shephard & Kevin Sheppard & Robert F. Engle, 2021. "Fitting Vast Dimensional Time-Varying Covariance Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 39(3), pages 652-668, July.
    19. Zhaoxing Gao & Ruey S. Tsay, 2021. "Divide-and-Conquer: A Distributed Hierarchical Factor Approach to Modeling Large-Scale Time Series Data," Papers 2103.14626, arXiv.org.
    20. Molero-González, L. & Trinidad-Segovia, J.E. & Sánchez-Granero, M.A. & García-Medina, A., 2023. "Market Beta is not dead: An approach from Random Matrix Theory," Finance Research Letters, Elsevier, vol. 55(PA).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:osfxxx:9twbk_v1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://osf.io/preprints/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.