IDEAS home Printed from
MyIDEAS: Log in (now much improved!) to save this article

Exact methods for variable selection in principal component analysis: Guide functions and pre-selection

Listed author(s):
  • Pacheco, Joaquín
  • Casado, Silvia
  • Porras, Santiago
Registered author(s):

    A variable selection problem is analysed for use in Principal Component Analysis (PCA). In this case, the set of original variables is divided into disjoint groups. The problem resides in the selection of variables, but with the restriction that the set of variables that is selected should contain at least one variable from each group. The objective function under consideration is the sum of the first eigenvalues of the correlation matrix of the subset of selected variables. This problem, with no known prior references, has two further difficulties, in addition to that of the variable selection problem: the evaluation of the objective function and the restriction that the subset of selected variables should also contain elements from all of the groups. Two Branch & Bound methods are proposed to obtain exact solutions that incorporate two strategies: the first one is the use of “fast” guide functions as alternatives to the objective function; the second one is the preselection of variables that help to comply with the latter restriction. From the computational tests, it is seen that both strategies are very efficient and achieve significant reductions in calculation times.

    If you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.

    File URL:
    Download Restriction: Full text for ScienceDirect subscribers only.

    As the access to this document is restricted, you may want to look for a different version under "Related research" (further below) or search for a different version of it.

    Article provided by Elsevier in its journal Computational Statistics & Data Analysis.

    Volume (Year): 57 (2013)
    Issue (Month): 1 ()
    Pages: 95-111

    in new window

    Handle: RePEc:eee:csdana:v:57:y:2013:i:1:p:95-111
    DOI: 10.1016/j.csda.2012.06.014
    Contact details of provider: Web page:

    References listed on IDEAS
    Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:

    in new window

    1. Krzanowski, Wojtek J. & Hand, David J., 2009. "A simple method for screening variables before clustering microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(7), pages 2747-2753, May.
    2. Michela Nardo & Michaela Saisana & Andrea Saltelli & Stefano Tarantola & Anders Hoffman & Enrico Giovannini, 2005. "Handbook on Constructing Composite Indicators: Methodology and User Guide," OECD Statistics Working Papers 2005/3, OECD Publishing.
    3. Gatu, Cristian & Yanev, Petko I. & Kontoghiorghes, Erricos J., 2007. "A graph approach to generate all possible regression submodels," Computational Statistics & Data Analysis, Elsevier, vol. 52(2), pages 799-815, October.
    4. Pacheco, Joaquín & Casado, Silvia & Núñez, Laura, 2009. "A variable selection method based on Tabu search for logistic regression models," European Journal of Operational Research, Elsevier, vol. 199(2), pages 506-511, December.
    5. Douglas Steinley & Michael Brusco, 2008. "Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures," Psychometrika, Springer;The Psychometric Society, vol. 73(1), pages 125-144, March.
    6. Brusco, Michael J. & Steinley, Douglas, 2011. "Exact and approximate algorithms for variable selection in linear discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 123-131, January.
    7. Li, Baibing & Martin, Elaine B. & Morris, A. Julian, 2002. "On principal component analysis in L1," Computational Statistics & Data Analysis, Elsevier, vol. 40(3), pages 471-474, September.
    8. Kristine Hogarty & Jeffrey Kromrey & John Ferron & Constance Hines, 2004. "Selection of variables in exploratory factor analysis: An empirical comparison of a stepwise and traditional approach," Psychometrika, Springer;The Psychometric Society, vol. 69(4), pages 593-611, December.
    9. Michael Brusco & J. Cradit, 2001. "A variable-selection heuristic for K-means clustering," Psychometrika, Springer;The Psychometric Society, vol. 66(2), pages 249-270, June.
    10. Tangian, Andranik, 2007. "Analysis of the third European survey on working conditions with composite indicators," European Journal of Operational Research, Elsevier, vol. 181(1), pages 468-499, August.
    11. Hofmann, Marc & Gatu, Cristian & Kontoghiorghes, Erricos John, 2007. "Efficient algorithms for computing the best subset regression models for large-scale problems," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 16-29, September.
    12. Michael Brusco & Renu Singh & Douglas Steinley, 2009. "Variable Neighborhood Search Heuristics for Selecting a Subset of Variables in Principal Component Analysis," Psychometrika, Springer;The Psychometric Society, vol. 74(4), pages 705-726, December.
    13. A. Pedro Duarte Silva, 2000. "DISCARDING VARIABLES in PRINCIPAL COMPONENT ANALYSIS : ALGORITHMS for ALL-SUBSETS COMPARISONS," Working Papers de Economia (Economics Working Papers) 02, Católica Porto Business School, Universidade Católica Portuguesa.
    14. Ying Chan & Cheuk Kwan & Tan Shek, 2005. "Quality of Life in Hong Kong: the Cuhk Hong Kong Quality of Life Index," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 71(1), pages 259-289, 03.
    15. Pacheco, Joaquin & Casado, Silvia & Nunez, Laura & Gomez, Olga, 2006. "Analysis of new variable selection methods for discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 51(3), pages 1463-1478, December.
    16. Yutaka Kano & Akira Harada, 2000. "Stepwise variable selection in factor analysis," Psychometrika, Springer;The Psychometric Society, vol. 65(1), pages 7-22, March.
    17. Cadima, Jorge & Cerdeira, J. Orestes & Minhoto, Manuel, 2004. "Computational aspects of algorithms for variable selection in the context of principal components," Computational Statistics & Data Analysis, Elsevier, vol. 47(2), pages 225-236, September.
    Full references (including those not matched with items on IDEAS)

    This item is not listed on Wikipedia, on a reading list or among the top items on IDEAS.

    When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:57:y:2013:i:1:p:95-111. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Dana Niculescu)

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If references are entirely missing, you can add them using this form.

    If the full references list an item that is present in RePEc, but the system did not link to it, you can help with this form.

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    This information is provided to you by IDEAS at the Research Division of the Federal Reserve Bank of St. Louis using RePEc data.