IDEAS home Printed from https://ideas.repec.org/a/inm/orijoc/v37y2025i3p582-602.html

Exact and Approximation Algorithms for Sparse Principal Component Analysis

Author

Listed:
  • Yongchun Li

    (Department of Industrial & Systems Engineering, The University of Tennessee, Knoxville, Tennessee 37996)

  • Weijun Xie

    (H. Milton Stewart School of Industrial and Systems Engineering, Georgia Tech, Atlanta, Georgia 30332)

Abstract

Sparse principal component analysis (SPCA) is designed to enhance the interpretability of traditional principal component analysis by optimally selecting a subset of features that comprise the first principal component. Given the NP-hard nature of SPCA, most current approaches resort to approximate solutions, typically achieved through tractable semidefinite programs or heuristic methods. To solve SPCA to optimality, we propose two exact mixed-integer semidefinite programs (MISDPs) and an arbitrarily equivalent mixed-integer linear program. The MISDPs allow us to design an effective branch-and-cut algorithm with closed-form cuts that do not need to solve dual problems. For the proposed mixed-integer formulations, we further derive the theoretical optimality gaps of their continuous relaxations. Besides, we apply the greedy and local search algorithms to solving SPCA and derive their first-known approximation ratios. Our numerical experiments reveal that the exact methods we developed can efficiently find optimal solutions for data sets containing hundreds of features. Furthermore, our approximation algorithms demonstrate both scalability and near-optimal performance when benchmarked on larger data sets, specifically those with thousands of features.

Suggested Citation

  • Yongchun Li & Weijun Xie, 2025. "Exact and Approximation Algorithms for Sparse Principal Component Analysis," INFORMS Journal on Computing, INFORMS, vol. 37(3), pages 582-602, May.
  • Handle: RePEc:inm:orijoc:v:37:y:2025:i:3:p:582-602
    DOI: 10.1287/ijoc.2022.0372
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijoc.2022.0372
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijoc.2022.0372?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Harrison, David Jr. & Rubinfeld, Daniel L., 1978. "Hedonic housing prices and the demand for clean air," Journal of Environmental Economics and Management, Elsevier, vol. 5(1), pages 81-102, March.
    2. Jinhak Kim & Mohit Tawarmalani & Jean-Philippe P. Richard, 2022. "Convexification of Permutation-Invariant Sets and an Application to Sparse Principal Component Analysis," Mathematics of Operations Research, INFORMS, vol. 47(4), pages 2547-2584, November.
    3. Jonathan H. Owen & Sanjay Mehrotra, 2002. "On the Value of Binary Expansions for General Mixed-Integer Linear Programs," Operations Research, INFORMS, vol. 50(5), pages 810-819, October.
    4. JOURNEE, Michel & NESTEROV, Yurii & RICHTARIK, Peter & SEPULCHRE, Rodolphe, 2010. "Generalized power method for sparse principal component analysis," LIDAM Reprints CORE 2232, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    5. Ompad, D.C. & Ikeda, R.M. & Shah, N. & Fuller, C.M. & Bailey, S. & Morse, E. & Kerndt, P. & Maslow, C. & Wu, Y. & Vlahov, D. & Garfein, R. & Strathdee, S.A., 2005. "Childhood sexual abuse and age at initiation of injection drug use," American Journal of Public Health, American Public Health Association, vol. 95(4), pages 703-709.
    6. Santanu S. Dey & Rahul Mazumder & Guanyi Wang, 2022. "Using ℓ 1 -Relaxation and Integer Programming to Obtain Dual Bounds for Sparse PCA," Operations Research, INFORMS, vol. 70(3), pages 1914-1932, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lucija Muehlenbachs & Elisheba Spiller & Christopher Timmins, 2015. "The Housing Market Impacts of Shale Gas Development," American Economic Review, American Economic Association, vol. 105(12), pages 3633-3659, December.
    2. Binyuan Chen & Simge Küçükyavuz & Suvrajeet Sen, 2011. "Finite Disjunctive Programming Characterizations for General Mixed-Integer Linear Programs," Operations Research, INFORMS, vol. 59(1), pages 202-210, February.
    3. Jianhong Shi & Qian Yang & Xiongya Li & Weixing Song, 2017. "Effects of measurement error on a class of single-index varying coefficient regression models," Computational Statistics, Springer, vol. 32(3), pages 977-1001, September.
    4. Yixuan Qiu & Jing Lei & Kathryn Roeder, 2023. "Gradient-based sparse principal component analysis with extensions to online learning," Biometrika, Biometrika Trust, vol. 110(2), pages 339-360.
    5. Anna Romanowska & Piotr Oskar Czechowski & Tomasz Owczarek & Maria Szuszkiewicz & Aneta Oniszczuk-Jastrząbek & Ernest Czermański, 2025. "Model of the Influence of Air Pollution and Other Environmental Factors on the Real Estate Market in Warsaw in 2010–2022," Sustainability, MDPI, vol. 17(16), pages 1-21, August.
    6. Villalonga, Belen, 2004. "Intangible resources, Tobin's q, and sustainability of performance differences," Journal of Economic Behavior & Organization, Elsevier, vol. 54(2), pages 205-230, June.
    7. Marwan Al-Momani, 2023. "Liu-type pretest and shrinkage estimation for the conditional autoregressive model," PLOS ONE, Public Library of Science, vol. 18(4), pages 1-23, April.
    8. Brockmeier, M., 1991. "Entwicklung und Aufhebung von Reinheitsgeboten im Nahrungsmittelbereich – Analyse und Bewertung," Proceedings “Schriften der Gesellschaft für Wirtschafts- und Sozialwissenschaften des Landbaues e.V.”, German Association of Agricultural Economists (GEWISOLA), vol. 27.
    9. Miles M Finney, 2017. "Air Quality and the Development of Los Angeles," The Review of Regional Studies, Southern Regional Science Association, vol. 47(3), pages 271-288, Fall.
    10. Terri Menke, 1987. "Economic Welfare and Urban Amenities Across Race-Sex Groups," Urban Studies, Urban Studies Journal Limited, vol. 24(2), pages 151-161, April.
    11. Suneel Babu Chatla, 2023. "Nonparametric inference for additive models estimated via simplified smooth backfitting," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 75(1), pages 71-97, February.
    12. Miller, Steve & Startz, Richard, 2019. "Feasible generalized least squares using support vector regression," Economics Letters, Elsevier, vol. 175(C), pages 28-31.
    13. Chunfang Zhao & Yingliang Wu & Yunfeng Chen & Guohua Chen, 2023. "Multiscale Effects of Hedonic Attributes on Airbnb Listing Prices Based on MGWR: A Case Study of Beijing, China," Sustainability, MDPI, vol. 15(2), pages 1-21, January.
    14. Umberto Amato & Anestis Antoniadis & Italia De Feis & Irene Gijbels, 2021. "Penalised robust estimators for sparse and high-dimensional linear models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 1-48, March.
    15. Prendergast, Luke A. & Li Wai Suen, Connie, 2011. "A new and practical influence measure for subsets of covariance matrix sample principal components with applications to high dimensional datasets," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 752-764, January.
    16. repec:asg:wpaper:1006 is not listed on IDEAS
    17. Tizheng Li & Xiaojuan Kang, 2022. "Variable selection of higher-order partially linear spatial autoregressive model with a diverging number of parameters," Statistical Papers, Springer, vol. 63(1), pages 243-285, February.
    18. Deac Dan Stelian & Schebesch Klaus Bruno, 2018. "Market Forecasts and Client Behavioral Data: Towards Finding Adequate Model Complexity," Studia Universitatis „Vasile Goldis” Arad – Economics Series, Sciendo, vol. 28(3), pages 50-75, September.
    19. James Hansen & James McDonald & Panayiotis Theodossiou & Brad Larsen, 2010. "Partially Adaptive Econometric Methods For Regression and Classification," Computational Economics, Springer;Society for Computational Economics, vol. 36(2), pages 153-169, August.
    20. Tang, Yanlin & Song, Xinyuan & Wang, Huixia Judy & Zhu, Zhongyi, 2013. "Variable selection in high-dimensional quantile varying coefficient models," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 115-132.
    21. Juan Ignacio Zoloa, 2020. "Noise pollution and housing markets: A spatial hedonic analysis for La Plata City," Ensayos de Política Económica, Departamento de Investigación Francisco Valsecchi, Facultad de Ciencias Económicas, Pontificia Universidad Católica Argentina., vol. 3(2), pages 129-152, Octubre.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:37:y:2025:i:3:p:582-602. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.