IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v77y2014icp38-53.html
   My bibliography  Save this article

A comparison of simulated annealing algorithms for variable selection in principal component analysis and discriminant analysis

Author

Listed:
  • Brusco, Michael J.

Abstract

Variable selection is a venerable problem in multivariate statistics. Simulated annealing is one of a variety of metaheuristics that can be gainfully employed for variable selection; however, its effectiveness is influenced by algorithm design features such as the construction of the initial subset, the maximum and minimum temperatures, the cooling scheme, and the process for generating trial subsets in the neighborhood of the incumbent subset. These design features were manipulated to produce 24 versions of a simulated annealing algorithm for the problem of selecting exactly p out of m candidate variables. The versions were then compared within the contexts of principal component analysis and discriminant analysis. The results suggest some complex and interesting interactions among the design features, yet some robust versions across the two studies were established.

Suggested Citation

  • Brusco, Michael J., 2014. "A comparison of simulated annealing algorithms for variable selection in principal component analysis and discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 38-53.
  • Handle: RePEc:eee:csdana:v:77:y:2014:i:c:p:38-53
    DOI: 10.1016/j.csda.2014.03.001
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947314000668
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2014.03.001?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Fouskakis, Dimitris & Draper, David, 2008. "Comparing Stochastic Optimization Methods for Variable Selection in Binary Outcome Prediction, With Application to Health Policy," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1367-1381.
    2. Michael J. Brusco & Larry W. Jacobs & Robert J. Bongiorno & Duane V. Lyons & Baoxing Tang, 1995. "Improving Personnel Scheduling at Airline Stations," Operations Research, INFORMS, vol. 43(5), pages 741-751, October.
    3. Alex Murillo & J. Fernando Vera & Willem J. Heiser, 2005. "A Permutation-Translation Simulated Annealing Algorithm for L 1 and L 2 Unidimensional Scaling," Journal of Classification, Springer;The Classification Society, vol. 22(1), pages 119-138, June.
    4. Gatu, Cristian & Yanev, Petko I. & Kontoghiorghes, Erricos J., 2007. "A graph approach to generate all possible regression submodels," Computational Statistics & Data Analysis, Elsevier, vol. 52(2), pages 799-815, October.
    5. Pacheco, Joaquín & Casado, Silvia & Núñez, Laura, 2009. "A variable selection method based on Tabu search for logistic regression models," European Journal of Operational Research, Elsevier, vol. 199(2), pages 506-511, December.
    6. Gordon D. Murray, 1977. "A Cautionary Note on Selection of Variables in Discriminant Analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 26(3), pages 246-250, November.
    7. Brusco, Michael J. & Steinley, Douglas, 2011. "Exact and approximate algorithms for variable selection in linear discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 123-131, January.
    8. Li, Baibing & Martin, Elaine B. & Morris, A. Julian, 2002. "On principal component analysis in L1," Computational Statistics & Data Analysis, Elsevier, vol. 40(3), pages 471-474, September.
    9. Eva Vande Gaer & Eva Ceulemans & Iven Mechelen & Peter Kuppens, 2012. "The CLASSI-N Method for the Study of Sequential Processes," Psychometrika, Springer;The Psychometric Society, vol. 77(1), pages 85-105, January.
    10. J. Fernando Vera & Willem J. Heiser & Alex Murillo, 2007. "Global Optimization in Any Minkowski Metric: A Permutation-Translation Simulated Annealing Algorithm for Multidimensional Scaling," Journal of Classification, Springer;The Classification Society, vol. 24(2), pages 277-301, September.
    11. Hofmann, Marc & Gatu, Cristian & Kontoghiorghes, Erricos John, 2007. "Efficient algorithms for computing the best subset regression models for large-scale problems," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 16-29, September.
    12. Michael Brusco & Renu Singh & Douglas Steinley, 2009. "Variable Neighborhood Search Heuristics for Selecting a Subset of Variables in Principal Component Analysis," Psychometrika, Springer;The Psychometric Society, vol. 74(4), pages 705-726, December.
    13. Michael Brusco & Hans-Friedrich Köhn, 2009. "Exemplar-Based Clustering via Simulated Annealing," Psychometrika, Springer;The Psychometric Society, vol. 74(3), pages 457-475, September.
    14. Dray, Stephane, 2008. "On the number of principal components: A test of dimensionality based on measurements of similarity between matrices," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 2228-2237, January.
    15. Pacheco, Joaquín & Casado, Silvia & Porras, Santiago, 2013. "Exact methods for variable selection in principal component analysis: Guide functions and pre-selection," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 95-111.
    16. Eva Ceulemans & Iven Mechelen, 2008. "CLASSI: A classification model for the study of sequential processes and individual differences therein," Psychometrika, Springer;The Psychometric Society, vol. 73(1), pages 107-124, March.
    17. Pacheco, Joaquin & Casado, Silvia & Nunez, Laura & Gomez, Olga, 2006. "Analysis of new variable selection methods for discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 51(3), pages 1463-1478, December.
    18. António Pedro Duarte Silva, 2002. "Discarding Variables in a Principal Component Analysis: Algorithms for All-Subsets Comparisons," Computational Statistics, Springer, vol. 17(2), pages 251-271, July.
    19. Yutaka Kano & Akira Harada, 2000. "Stepwise variable selection in factor analysis," Psychometrika, Springer;The Psychometric Society, vol. 65(1), pages 7-22, March.
    20. Cadima, Jorge & Cerdeira, J. Orestes & Minhoto, Manuel, 2004. "Computational aspects of algorithms for variable selection in the context of principal components," Computational Statistics & Data Analysis, Elsevier, vol. 47(2), pages 225-236, September.
    21. W. J. Krzanowski, 1987. "Selection of Variables to Preserve Multivariate Data Structure, Using Principal Components," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 36(1), pages 22-33, March.
    22. Krzanowski, Wojtek J. & Hand, David J., 2009. "A simple method for screening variables before clustering microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(7), pages 2747-2753, May.
    23. Manuel Laguna & Fred Glover, 1993. "Bandwidth Packing: A Tabu Search Approach," Management Science, INFORMS, vol. 39(4), pages 492-500, April.
    24. M.J. Brusco & L.W. Jacobs & G.M. Thompson, 1999. "A morphing procedure to supplement a simulated annealing heuristic for cost‐ andcoverage‐correlated set‐covering problems," Annals of Operations Research, Springer, vol. 86(0), pages 611-627, January.
    25. Joost Rosmalen & Patrick Groenen & Javier Trejos & William Castillo, 2009. "Optimization Strategies for Two-Mode Partitioning," Journal of Classification, Springer;The Classification Society, vol. 26(2), pages 155-181, August.
    26. Fernando Chiyoshi & Roberto Galvão, 2000. "A statistical analysis of simulated annealing applied to the p-median problem," Annals of Operations Research, Springer, vol. 96(1), pages 61-74, November.
    27. Michael Brusco & Hans-Friedrich Köhn, 2009. "Erratum to: Exemplar-Based Clustering via Simulated Annealing," Psychometrika, Springer;The Psychometric Society, vol. 74(4), pages 755-755, December.
    28. J. Ramsay & Jos Berge & G. Styan, 1984. "Matrix correlation," Psychometrika, Springer;The Psychometric Society, vol. 49(3), pages 403-423, September.
    29. Kristine Hogarty & Jeffrey Kromrey & John Ferron & Constance Hines, 2004. "Selection of variables in exploratory factor analysis: An empirical comparison of a stepwise and traditional approach," Psychometrika, Springer;The Psychometric Society, vol. 69(4), pages 593-611, December.
    30. Trendafilov, Nickolay T. & Jolliffe, Ian T., 2007. "DALASS: Variable selection in discriminant analysis via the LASSO," Computational Statistics & Data Analysis, Elsevier, vol. 51(8), pages 3718-3736, May.
    31. I. T. Jolliffe, 1973. "Discarding Variables in a Principal Component Analysis. Ii: Real Data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 22(1), pages 21-31, March.
    32. Michael J. Brusco, 2006. "On the Performance of Simulated Annealing for Large-Scale L 2 Unidimensional Scaling," Journal of Classification, Springer;The Classification Society, vol. 23(2), pages 255-268, September.
    33. Eva Ceulemans & Iven Mechelen & Iwin Leenen, 2007. "The Local Minima Problem in Hierarchical Classes Analysis: An Evaluation of a Simulated Annealing Algorithm and Various Multistart Procedures," Psychometrika, Springer;The Psychometric Society, vol. 72(3), pages 377-391, September.
    34. Duarte Silva, António Pedro, 2001. "Efficient Variable Screening for Multivariate Analysis," Journal of Multivariate Analysis, Elsevier, vol. 76(1), pages 35-62, January.
    35. I. T. Jolliffe, 1972. "Discarding Variables in a Principal Component Analysis. I: Artificial Data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 21(2), pages 160-173, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. David Ayllón & Roberto Gil-Pita & Fernando Seoane, 2016. "Detection and Classification of Measurement Errors in Bioimpedance Spectroscopy," PLOS ONE, Public Library of Science, vol. 11(6), pages 1-19, June.
    2. JinXing Che & YouLong Yang, 2017. "Stochastic correlation coefficient ensembles for variable selection," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(10), pages 1721-1742, July.
    3. Michael Brusco & Hannah J Stolze & Michaela Hoffman & Douglas Steinley, 2017. "A simulated annealing heuristic for maximum correlation core/periphery partitioning of binary networks," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-17, May.
    4. Ortelli, Nicola & Hillel, Tim & Pereira, Francisco C. & de Lapparent, Matthieu & Bierlaire, Michel, 2021. "Assisted specification of discrete choice models," Journal of choice modelling, Elsevier, vol. 39(C).
    5. Daiva Stanelyte & Virginijus Radziukynas, 2019. "Review of Voltage and Reactive Power Control Algorithms in Electrical Distribution Networks," Energies, MDPI, vol. 13(1), pages 1-26, December.
    6. Zhenzhong Wang & Zhengyuan Zhu & Cindy Yu, 2020. "Variable Selection in Macroeconomic Forecasting with Many Predictors," Papers 2007.10160, arXiv.org.
    7. Paz, Alexander & Arteaga, Cristian & Cobos, Carlos, 2019. "Specification of mixed logit models assisted by an optimization framework," Journal of choice modelling, Elsevier, vol. 30(C), pages 50-60.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pacheco, Joaquín & Casado, Silvia & Porras, Santiago, 2013. "Exact methods for variable selection in principal component analysis: Guide functions and pre-selection," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 95-111.
    2. Brusco, Michael J. & Steinley, Douglas, 2011. "Exact and approximate algorithms for variable selection in linear discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 123-131, January.
    3. Michael Brusco & Renu Singh & Douglas Steinley, 2009. "Variable Neighborhood Search Heuristics for Selecting a Subset of Variables in Principal Component Analysis," Psychometrika, Springer;The Psychometric Society, vol. 74(4), pages 705-726, December.
    4. Fouskakis, D., 2012. "Bayesian variable selection in generalized linear models using a combination of stochastic optimization methods," European Journal of Operational Research, Elsevier, vol. 220(2), pages 414-422.
    5. Michael Brusco & Hans-Friedrich Köhn, 2009. "Exemplar-Based Clustering via Simulated Annealing," Psychometrika, Springer;The Psychometric Society, vol. 74(3), pages 457-475, September.
    6. Cadima, Jorge & Cerdeira, J. Orestes & Minhoto, Manuel, 2004. "Computational aspects of algorithms for variable selection in the context of principal components," Computational Statistics & Data Analysis, Elsevier, vol. 47(2), pages 225-236, September.
    7. Cumming, J.A. & Wooff, D.A., 2007. "Dimension reduction via principal variables," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 550-565, September.
    8. Jolliffe, Ian, 2022. "A 50-year personal journey through time with principal component analysis," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    9. António Pedro Duarte Silva, 2002. "Discarding Variables in a Principal Component Analysis: Algorithms for All-Subsets Comparisons," Computational Statistics, Springer, vol. 17(2), pages 251-271, July.
    10. Stephen L. France & Wen Chen & Yumin Deng, 2017. "ADCLUS and INDCLUS: analysis, experimentation, and meta-heuristic algorithm extensions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(2), pages 371-393, June.
    11. Paz, Alexander & Arteaga, Cristian & Cobos, Carlos, 2019. "Specification of mixed logit models assisted by an optimization framework," Journal of choice modelling, Elsevier, vol. 30(C), pages 50-60.
    12. Michael Brusco & Hans-Friedrich Köhn & Stephanie Stahl, 2008. "Heuristic Implementation of Dynamic Programming for Matrix Permutation Problems in Combinatorial Data Analysis," Psychometrika, Springer;The Psychometric Society, vol. 73(3), pages 503-522, September.
    13. Tangian, Andranik S., 2017. "Selection of questions for VAAs and the VAA-based elections," Working Paper Series in Economics 100, Karlsruhe Institute of Technology (KIT), Department of Economics and Management.
    14. Bauer, Jan O. & Drabant, Bernhard, 2021. "Principal loading analysis," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    15. du Jardin, Philippe & Séverin, Eric, 2012. "Forecasting financial failure using a Kohonen map: A comparative study to improve model stability over time," European Journal of Operational Research, Elsevier, vol. 221(2), pages 378-396.
    16. A. Pedro Duarte Silva, 2009. "Exact and heuristic algorithms for variable selection: Extended Leaps and Bounds," Working Papers de Economia (Economics Working Papers) 01, Católica Porto Business School, Universidade Católica Portuguesa.
    17. Sergio Camiz & Valério D. Pillar, 2018. "Identifying the Informational/Signal Dimension in Principal Component Analysis," Mathematics, MDPI, vol. 6(11), pages 1-16, November.
    18. Colosimo Bianca Maria & Moya Ester Gutierrez & Moroni Giovanni & Petrò Stefano, 2008. "Statistical Sampling Strategies for Geometric Tolerance Inspection by CMM," Stochastics and Quality Control, De Gruyter, vol. 23(1), pages 109-121, January.
    19. Lee, Christopher Thomas & Guzman, David & Ponath, Claudia & Tieu, Lina & Riley, Elise & Kushel, Margot, 2016. "Residential patterns in older homeless adults: Results of a cluster analysis," Social Science & Medicine, Elsevier, vol. 153(C), pages 131-140.
    20. Michael Brusco & J Dennis Cradit & Douglas Steinley, 2021. "A comparison of 71 binary similarity coefficients: The effect of base rates," PLOS ONE, Public Library of Science, vol. 16(4), pages 1-19, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:77:y:2014:i:c:p:38-53. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.