IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v27y2012i4p605-626.html
   My bibliography  Save this article

Sparse principal components by semi-partition clustering

Author

Listed:
  • Doyo Enki
  • Nickolay Trendafilov

Abstract

A cluster-based method for constructing sparse principal components is proposed. The method initially forms clusters of variables, using a new clustering approach called the semi-partition, in two steps. First, the variables are ordered sequentially according to a criterion involving the correlations between variables. Then, the ordered variables are split into two parts based on their generalized variance. The first group of variables becomes an output cluster, while the second one—input for another run of the sequential process. After the optimal clusters have been formed, sparse components are constructed from the singular value decomposition of the data matrices of each cluster. The method is applied to simple data sets with smaller number of variables (p) than observations (n), as well as large gene expression data sets with p ≫ n. The resulting cluster-based sparse principal components are very promising as evaluated by objective criteria. The method is also compared with other existing approaches and is found to perform well. Copyright Springer-Verlag 2012

Suggested Citation

  • Doyo Enki & Nickolay Trendafilov, 2012. "Sparse principal components by semi-partition clustering," Computational Statistics, Springer, vol. 27(4), pages 605-626, December.
  • Handle: RePEc:spr:compst:v:27:y:2012:i:4:p:605-626
    DOI: 10.1007/s00180-011-0280-2
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s00180-011-0280-2
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s00180-011-0280-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. I. T. Jolliffe, 1972. "Discarding Variables in a Principal Component Analysis. I: Artificial Data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 21(2), pages 160-173, June.
    2. J. N. R. Jeffers, 1967. "Two Case Studies in the Application of Principal Component Analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 16(3), pages 225-236, November.
    3. Li, Baibing & Martin, Elaine B. & Morris, A. Julian, 2002. "On principal component analysis in L1," Computational Statistics & Data Analysis, Elsevier, vol. 40(3), pages 471-474, September.
    4. Valentin Rousson & Theo Gasser, 2004. "Simple component analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 53(4), pages 539-555, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kohei Adachi & Nickolay T. Trendafilov, 2016. "Sparse principal component analysis subject to prespecified cardinality of loadings," Computational Statistics, Springer, vol. 31(4), pages 1403-1427, December.
    2. Nickolay Trendafilov, 2014. "From simple structure to sparse components: a review," Computational Statistics, Springer, vol. 29(3), pages 431-454, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cumming, J.A. & Wooff, D.A., 2007. "Dimension reduction via principal variables," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 550-565, September.
    2. Sabatier, Robert & Reynès, Christelle, 2008. "Extensions of simple component analysis and simple linear discriminant analysis using genetic algorithms," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4779-4789, June.
    3. Jolliffe, Ian, 2022. "A 50-year personal journey through time with principal component analysis," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    4. Pacheco, Joaquín & Casado, Silvia & Porras, Santiago, 2013. "Exact methods for variable selection in principal component analysis: Guide functions and pre-selection," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 95-111.
    5. Carrizosa, Emilio & Guerrero, Vanesa, 2014. "Biobjective sparse principal component analysis," Journal of Multivariate Analysis, Elsevier, vol. 132(C), pages 151-159.
    6. Ronald Gunderson & Pin Ng, 2006. "Summarizing the Effect of a Wide Array of Amenity Measures into Simple Components," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 79(2), pages 313-335, November.
    7. Psaradakis, Zacharias & Vávra, Marián, 2014. "On testing for nonlinearity in multivariate time series," Economics Letters, Elsevier, vol. 125(1), pages 1-4.
    8. Bauer, Jan O. & Drabant, Bernhard, 2021. "Principal loading analysis," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    9. Trendafilov, Nickolay T. & Vines, Karen, 2009. "Simple and interpretable discrimination," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 979-989, February.
    10. Davood Hajinezhad & Qingjiang Shi, 2018. "Alternating direction method of multipliers for a class of nonconvex bilinear optimization: convergence analysis and applications," Journal of Global Optimization, Springer, vol. 70(1), pages 261-288, January.
    11. Brusco, Michael J., 2014. "A comparison of simulated annealing algorithms for variable selection in principal component analysis and discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 38-53.
    12. Imran Ahmad & Jung-Yong Kim, 2018. "Assessment of Whole Body and Local Muscle Fatigue Using Electromyography and a Perceived Exertion Scale for Squat Lifting," IJERPH, MDPI, vol. 15(4), pages 1-12, April.
    13. Waßenhoven, Anna & Rennings, Michael & Laibach, Natalie & Bröring, Stefanie, 2023. "What constitutes a “Key Enabling Technology” for transition processes: Insights from the bioeconomy's technological landscape," Technological Forecasting and Social Change, Elsevier, vol. 197(C).
    14. Galimberti, Giuliano & Soffritti, Gabriele, 2007. "Model-based methods to identify multiple cluster structures in a data set," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 520-536, September.
    15. Nickolay Trendafilov, 2014. "From simple structure to sparse components: a review," Computational Statistics, Springer, vol. 29(3), pages 431-454, June.
    16. Sergio Camiz & Valério D. Pillar, 2018. "Identifying the Informational/Signal Dimension in Principal Component Analysis," Mathematics, MDPI, vol. 6(11), pages 1-16, November.
    17. Juan Carlos Chávez & Felipe J. Fonseca & Manuel Gómez-Zaldívar, 2017. "Resoluciones de disputas comerciales y desempeño económico regional en México. (Commercial Disputes Resolution and Regional Economic Performance in Mexico)," Ensayos Revista de Economia, Universidad Autonoma de Nuevo Leon, Facultad de Economia, vol. 0(1), pages 79-93, May.
    18. Chen, Ray-Bing & Chen, Ying & Härdle, Wolfgang K., 2014. "TVICA—Time varying independent component analysis and its application to financial data," Computational Statistics & Data Analysis, Elsevier, vol. 74(C), pages 95-109.
    19. Yan Yu Chen & Chun-Cheih Chao & Fu-Chen Liu & Po-Chen Hsu & Hsueh-Fen Chen & Shih-Chi Peng & Yung-Jen Chuang & Chung-Yu Lan & Wen-Ping Hsieh & David Shan Hill Wong, 2013. "Dynamic Transcript Profiling of Candida albicans Infection in Zebrafish: A Pathogen-Host Interaction Study," PLOS ONE, Public Library of Science, vol. 8(9), pages 1-16, September.
    20. Plat, Richard, 2009. "Stochastic portfolio specific mortality and the quantification of mortality basis risk," Insurance: Mathematics and Economics, Elsevier, vol. 45(1), pages 123-132, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:27:y:2012:i:4:p:605-626. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.