IDEAS home Printed from https://ideas.repec.org/a/eee/phsmap/v658y2025ics0378437124008185.html

Optimal subset selection for distributed local principal component analysis

Author

Listed:
  • Guo, Guangbao
  • Qian, Guoqi

Abstract

Given that distributed PCA methods may sometimes produce large local approximation error, we propose a novel distributed PCA method, called distributed local PCA, to reduce the error by dimensionality reduction with an optimal subset selection criterion. The advantages of our optimal subset selection for the DLCPA include enhanced accuracy through precise covariance estimation, efficiency in handling large data sets, scalability in managing variable-exceeding-node scenarios, robustness against outliers, flexibility in parameter selection, and adaptability across data distributions. The involved low-dimensional covariance sub-estimators are obtained by computing their local principal components in distributed manner, and the one-step average covariance estimator is computed. Besides, mean squared error (MSE) is selected to measure the performance of the proposed method, and an optimal sub-estimator from the optimal criterion is obtained as having the minimum MSE value among all covariance sub-estimators. It is shown that the proposed method can not only improve the estimation accuracies in both simulated and real data experiments, but also greatly save computing time, which has a good promotion value in tackling big data problems.

Suggested Citation

  • Guo, Guangbao & Qian, Guoqi, 2025. "Optimal subset selection for distributed local principal component analysis," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 658(C).
  • Handle: RePEc:eee:phsmap:v:658:y:2025:i:c:s0378437124008185
    DOI: 10.1016/j.physa.2024.130308
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0378437124008185
    Download Restriction: Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

    File URL: https://libkey.io/10.1016/j.physa.2024.130308?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Johnstone, Iain M. & Lu, Arthur Yu, 2009. "On Consistency and Sparsity for Principal Components Analysis in High Dimensions," Journal of the American Statistical Association, American Statistical Association, vol. 104(486), pages 682-693.
    2. Kangqiang Li & Han Bao & Lixin Zhang, 2022. "Robust covariance estimation for distributed principal component analysis," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 85(6), pages 707-732, August.
    3. Li, Baibing & Martin, Elaine B. & Morris, A. Julian, 2002. "On principal component analysis in L1," Computational Statistics & Data Analysis, Elsevier, vol. 40(3), pages 471-474, September.
    4. Aisha Fayomi & Yannis Pantazis & Michail Tsagris & Andrew Wood, 2023. "Cauchy Robust Principal Component Analysis with Applications to High-Dimensional Data Sets," Working Papers 2304, University of Crete, Department of Economics.
    5. Milana Gataric & Tengyao Wang & Richard J. Samworth, 2020. "Sparse principal component analysis via axis‐aligned random projections," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(2), pages 329-359, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yacine Aït-Sahalia & Dacheng Xiu, 2019. "Principal Component Analysis of High-Frequency Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(525), pages 287-303, January.
    2. Landgraf, Andrew J. & Lee, Yoonkyung, 2020. "Dimensionality reduction for binary data through the projection of natural parameters," Journal of Multivariate Analysis, Elsevier, vol. 180(C).
    3. Kou Fujimori & Yuichi Goto & Yan Liu & Masanobu Taniguchi, 2023. "Sparse principal component analysis for high‐dimensional stationary time series," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 50(4), pages 1953-1983, December.
    4. Juan Carlos Chávez & Felipe J. Fonseca & Manuel Gómez-Zaldívar, 2017. "Resoluciones de disputas comerciales y desempeño económico regional en México. (Commercial Disputes Resolution and Regional Economic Performance in Mexico)," Ensayos Revista de Economia, Universidad Autonoma de Nuevo Leon, Facultad de Economia, vol. 0(1), pages 79-93, May.
    5. Chen, Ray-Bing & Chen, Ying & Härdle, Wolfgang K., 2014. "TVICA—Time varying independent component analysis and its application to financial data," Computational Statistics & Data Analysis, Elsevier, vol. 74(C), pages 95-109.
    6. Yan Yu Chen & Chun-Cheih Chao & Fu-Chen Liu & Po-Chen Hsu & Hsueh-Fen Chen & Shih-Chi Peng & Yung-Jen Chuang & Chung-Yu Lan & Wen-Ping Hsieh & David Shan Hill Wong, 2013. "Dynamic Transcript Profiling of Candida albicans Infection in Zebrafish: A Pathogen-Host Interaction Study," PLOS ONE, Public Library of Science, vol. 8(9), pages 1-16, September.
    7. Tom Boot & Bart Keijsers, 2025. "Diffusion index forecasts under weaker loadings: PCA, ridge regression, and random projections," Papers 2506.09575, arXiv.org.
    8. Plat, Richard, 2009. "Stochastic portfolio specific mortality and the quantification of mortality basis risk," Insurance: Mathematics and Economics, Elsevier, vol. 45(1), pages 123-132, August.
    9. Puyi Fang & Zhaoxing Gao & Ruey S. Tsay, 2023. "Determination of the effective cointegration rank in high-dimensional time-series predictive regressions," Papers 2304.12134, arXiv.org, revised Apr 2023.
    10. Kondylis, Athanassios & Whittaker, Joe, 2008. "Spectral preconditioning of Krylov spaces: Combining PLS and PC regression," Computational Statistics & Data Analysis, Elsevier, vol. 52(5), pages 2588-2603, January.
    11. M. J. Aziakpono & S. Kleimeier & H. Sander, 2012. "Banking market integration in the SADC countries: evidence from interest rate analyses," Applied Economics, Taylor & Francis Journals, vol. 44(29), pages 3857-3876, October.
    12. Bianca Maria Colosimo & Luca Pagani & Marco Grasso, 2024. "Modeling spatial point processes in video-imaging via Ripley’s K-function: an application to spatter analysis in additive manufacturing," Journal of Intelligent Manufacturing, Springer, vol. 35(1), pages 429-447, January.
    13. Ouyang, Yaofu & Li, Peng, 2018. "On the nexus of financial development, economic growth, and energy consumption in China: New perspective from a GMM panel VAR approach," Energy Economics, Elsevier, vol. 71(C), pages 238-252.
    14. Fan, Cheng & Sun, Yongjun & Zhao, Yang & Song, Mengjie & Wang, Jiayuan, 2019. "Deep learning-based feature engineering methods for improved building energy prediction," Applied Energy, Elsevier, vol. 240(C), pages 35-45.
    15. Ionela Munteanu & Adriana Grigorescu & Elena Condrea & Elena Pelinescu, 2020. "Convergent Insights for Sustainable Development and Ethical Cohesion: An Empirical Study on Corporate Governance in Romanian Public Entities," Sustainability, MDPI, vol. 12(7), pages 1-17, April.
    16. Daniel Boss & Annick Hoffmann & Benjamin Rappaz & Christian Depeursinge & Pierre J Magistretti & Dimitri Van de Ville & Pierre Marquet, 2012. "Spatially-Resolved Eigenmode Decomposition of Red Blood Cells Membrane Fluctuations Questions the Role of ATP in Flickering," PLOS ONE, Public Library of Science, vol. 7(8), pages 1-10, August.
    17. Doukas, Haris & Papadopoulou, Alexandra & Savvakis, Nikolaos & Tsoutsos, Theocharis & Psarras, John, 2012. "Assessing energy sustainability of rural communities using Principal Component Analysis," Renewable and Sustainable Energy Reviews, Elsevier, vol. 16(4), pages 1949-1957.
    18. Fan, Jianqing & Jiang, Bai & Sun, Qiang, 2022. "Bayesian factor-adjusted sparse regression," Journal of Econometrics, Elsevier, vol. 230(1), pages 3-19.
    19. Paschalis Arvanitidis & Athina Economou & Christos Kollias, 2016. "Terrorism’s effects on social capital in European countries," Public Choice, Springer, vol. 169(3), pages 231-250, December.
    20. Rizvi, Syed Kumail Abbas & Rahat, Birjees & Naqvi, Bushra & Umar, Muhammad, 2024. "Revolutionizing finance: The synergy of fintech, digital adoption, and innovation," Technological Forecasting and Social Change, Elsevier, vol. 200(C).

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:phsmap:v:658:y:2025:i:c:s0378437124008185. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/physica-a-statistical-mechpplications/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.