IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v13y2025i13p2140-d1691185.html
   My bibliography  Save this article

Multi-Stage Methods for Cost Controlled Data Compression Using Principal Component Analysis

Author

Listed:
  • Swarnali Banerjee

    (Department of Mathematics and Statistics and Center for Data Science and Consulting, Loyola University, Chicago, IL 60660, USA)

  • Bhargab Chattopadhyay

    (School of Management & Entrepreneurship, Indian Institute of Technology Jodhpur, Jodhpur 342030, Rajasthan, India)

Abstract

Several online principal component analysis (PCA) methodologies exist for data arriving sequentially that focus only on compression risk minimization. Recent work in this realm revolves around minimizing the cost-compression risk, which takes into account compression loss and sampling costs using a two-stage PCA procedure. Even though the procedure enjoys first-order efficiency, the authors could not mathematically verify the existence of the second-order efficiency property. In this article, we minimize cost-compression risk using a modified two-stage PCA procedure, which takes into account the data compression loss as well as the sampling cost when the smallest eigenvalue of the population variance–covariance matrix or its positive lower bound is known when the data is assumed to follow a multivariate normal distribution. The modified two-stage PCA procedure is shown to possess the second-order efficiency property, among others, including the second-order risk efficiency property under some conditions. The proposed method is novel but also fast and efficient, as illustrated by extensive data analyses through simulations and real data analysis.

Suggested Citation

  • Swarnali Banerjee & Bhargab Chattopadhyay, 2025. "Multi-Stage Methods for Cost Controlled Data Compression Using Principal Component Analysis," Mathematics, MDPI, vol. 13(13), pages 1-15, June.
  • Handle: RePEc:gam:jmathe:v:13:y:2025:i:13:p:2140-:d:1691185
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/13/13/2140/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/13/13/2140/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Yata, Kazuyoshi & Aoshima, Makoto, 2013. "PCA consistency for the power spiked model in high-dimensional settings," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 334-354.
    2. N. Mukhopadhyay, 1980. "A consistent and asymptotically efficient two-stage procedure to construct fixed width confidence intervals for the mean," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 27(1), pages 281-284, December.
    3. Kazuyoshi Yata & Makoto Aoshima, 2009. "PCA Consistency for Non-Gaussian Data in High Dimension, Low Sample Size Context," Communications in Statistics - Theory and Methods, Taylor & Francis Journals, vol. 38(16-17), pages 2634-2652, October.
    4. Hao-Che Chen, 2014. "Visualisation of financial time series by linear principal component analysis and nonlinear principal component analysis," Papers 1410.7961, arXiv.org.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Leng-Cheng Hwang & Chia-Chen Yang, 2015. "A robust two-stage procedure in Bayes sequential estimation of a particular exponential family," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 78(2), pages 145-159, February.
    2. Nitis Mukhopadhyay & Anhar Aloufi, 2024. "Second-order (s.o.) multi-stage fixed-width confidence interval (FWCI) estimation strategies for comparing location parameters from two negative exponential (NE) populations: illustrations with cancer," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 87(6), pages 649-680, August.
    3. Makoto Aoshima & Kazuyoshi Yata, 2019. "Distance-based classifier by data transformation for high-dimension, strongly spiked eigenvalue models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(3), pages 473-503, June.
    4. Aoshima, Makoto & Mukhopadhyay, Nitis, 1998. "Fixed-Width Simultaneous Confidence Intervals for Multinormal Means in Several Intraclass Correlation Models," Journal of Multivariate Analysis, Elsevier, vol. 66(1), pages 46-63, July.
    5. Hokwon Cho, 2019. "Two-Stage Procedure of Fixed-Width Confidence Intervals for the Risk Ratio," Methodology and Computing in Applied Probability, Springer, vol. 21(3), pages 721-733, September.
    6. Nitis Mukhopadhyay & Soumik Banerjee, 2023. "A General Theory of Three-Stage Estimation Strategy with Second-Order Asymptotics and Its Applications," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 85(1), pages 401-440, February.
    7. Sen, Pranab K. & Tsai, Ming-Tien, 1999. "Two-Stage Likelihood Ratio and Union-Intersection Tests for One-Sided Alternatives Multivariate Mean with Nuisance Dispersion Matrix," Journal of Multivariate Analysis, Elsevier, vol. 68(2), pages 264-282, February.
    8. Bando, Takuma & Sei, Tomonari & Yata, Kazuyoshi, 2022. "Consistency of the objective general index in high-dimensional settings," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    9. Kazuyoshi Yata & Makoto Aoshima, 2020. "Geometric consistency of principal component scores for high‐dimensional mixture models and its application," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(3), pages 899-921, September.
    10. Huang, Shih-Hao & Huang, Su-Yun, 2021. "On the asymptotic normality and efficiency of Kronecker envelope principal component analysis," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    11. Wang, Shao-Hsuan & Huang, Su-Yun & Chen, Ting-Li, 2020. "On asymptotic normality of cross data matrix-based PCA in high dimension low sample size," Journal of Multivariate Analysis, Elsevier, vol. 175(C).
    12. Jonathan Gillard & Emily O’Riordan & Anatoly Zhigljavsky, 2023. "Polynomial whitening for high-dimensional data," Computational Statistics, Springer, vol. 38(3), pages 1427-1461, September.
    13. Okudo, Michiko & Komaki, Fumiyasu, 2021. "Shrinkage priors for single-spiked covariance models," Statistics & Probability Letters, Elsevier, vol. 176(C).
    14. Nitis Mukhopadhyay & William Duggan, 1999. "On a Two-Stage Procedure Having Second-Order Properties with Applications," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 51(4), pages 621-636, December.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:13:y:2025:i:13:p:2140-:d:1691185. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.