IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v32y1983i3p267-275.html
   My bibliography  Save this article

On Using Principal Components before Separating a Mixture of Two Multivariate Normal Distributions

Author

Listed:
  • Wei‐Chien Chang

Abstract

In applying principal components for reducing the dimension of the data before clustering, it has ordinarily been the practice to use components with the largest eigenvalues. We prove, by means of a mixture of two multivariate normal distributions, that this practice is not justified in general. A relationship between the distance of the two sub populations and any subset of principal components is derived, showing that the components with the larger eigenvalues do not necessarily contain more information (distance). This result is further demonstrated through hypothetical as well as real situations which use actual data. The effect of scaling the variables on the distribution of the information to different components is investigated. An application to a mixture of two normal distributions is illustrated by utilizing a set of generated data in which the information is concentrated in the components with the largest and the smallest eigenvalues.

Suggested Citation

  • Wei‐Chien Chang, 1983. "On Using Principal Components before Separating a Mixture of Two Multivariate Normal Distributions," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 32(3), pages 267-275, November.
  • Handle: RePEc:bla:jorssc:v:32:y:1983:i:3:p:267-275
    DOI: 10.2307/2347949
    as

    Download full text from publisher

    File URL: https://doi.org/10.2307/2347949
    Download Restriction: no

    File URL: https://libkey.io/10.2307/2347949?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Michael C. Thrun & Alfred Ultsch, 2021. "Using Projection-Based Clustering to Find Distance- and Density-Based Clusters in High-Dimensional Data," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 280-312, July.
    2. Edward J. Bedrick, 2020. "Data reduction prior to inference: Are there consequences of comparing groups using a t‐test based on principal component scores?," Biometrics, The International Biometric Society, vol. 76(2), pages 508-517, June.
    3. Wang, Zihan & Daeipour, Mohamad & Xu, Hongyi, 2023. "Quantification and propagation of Aleatoric uncertainties in topological structures," Reliability Engineering and System Safety, Elsevier, vol. 233(C).
    4. Robert Kapłon, 2006. "A retrospective review of categorical data analysis – theory and marketing practice," Operations Research and Decisions, Wroclaw University of Science and Technology, Faculty of Management, vol. 16(1), pages 55-72.
    5. Andrews, Jeffrey L., 2018. "Addressing overfitting and underfitting in Gaussian model-based clustering," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 160-171.
    6. Douglas Steinley & Lawrence Hubert, 2008. "Order-Constrained Solutions in K-Means Clustering: Even Better Than Being Globally Optimal," Psychometrika, Springer;The Psychometric Society, vol. 73(4), pages 647-664, December.
    7. Heungsun Hwang & Hec Montréal & William Dillon & Yoshio Takane, 2006. "An Extension of Multiple Correspondence Analysis for Identifying Heterogeneous Subgroups of Respondents," Psychometrika, Springer;The Psychometric Society, vol. 71(1), pages 161-171, March.
    8. Cappozzo, Andrea & Greselin, Francesca & Murphy, Thomas Brendan, 2021. "Robust variable selection for model-based learning in presence of adulteration," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    9. Bouveyron, Charles & Brunet-Saumard, Camille, 2014. "Model-based clustering of high-dimensional data: A review," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 52-78.
    10. Ahlquist, John S. & Breunig, Christian, 2009. "Country clustering in comparative political economy," MPIfG Discussion Paper 09/5, Max Planck Institute for the Study of Societies.
    11. McLachlan, G. J. & Peel, D. & Bean, R. W., 2003. "Modelling high-dimensional data by mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 379-388, January.
    12. Liang, Faming, 2007. "Use of SVD-based probit transformation in clustering gene expression profiles," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6355-6366, August.
    13. Bouveyron, Charles & Brunet, Camille, 2012. "Theoretical and practical considerations on the convergence properties of the Fisher-EM algorithm," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 29-41.
    14. Yoshikazu Terada, 2015. "Strong consistency of factorial $$K$$ K -means clustering," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 67(2), pages 335-357, April.
    15. Banerjee, Trambak & Mukherjee, Gourab & Radchenko, Peter, 2017. "Feature screening in large scale cluster analysis," Journal of Multivariate Analysis, Elsevier, vol. 161(C), pages 191-212.
    16. Floriello, Davide & Vitelli, Valeria, 2017. "Sparse clustering of functional data," Journal of Multivariate Analysis, Elsevier, vol. 154(C), pages 1-18.
    17. Zhiliang Ma & Adam Cardinal-Stakenas & Youngser Park & Michael Trosset & Carey Priebe, 2010. "Dimensionality Reduction on the Cartesian Product of Embeddings of Multiple Dissimilarity Matrices," Journal of Classification, Springer;The Classification Society, vol. 27(3), pages 307-321, November.
    18. Laura Anderlucci & Francesca Fortunato & Angela Montanari, 2022. "High-Dimensional Clustering via Random Projections," Journal of Classification, Springer;The Classification Society, vol. 39(1), pages 191-216, March.
    19. A. Penttinen & W. Krzanowski & J. Kettenring & F. Rohlf & William Day & B. Weir & John Kececioglu & N. Ohsumi & Peter Willett, 1993. "Book reviews," Journal of Classification, Springer;The Classification Society, vol. 10(1), pages 125-156, January.
    20. Lazhar Labiod & Mohamed Nadif, 2021. "Efficient regularized spectral data embedding," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(1), pages 99-119, March.
    21. Douglas Steinley, 2009. "F. Murtagh (2005). Correspondence analysis and data coding with Java and R. 230 pp., US$76.00. ISBN 1584885289," Psychometrika, Springer;The Psychometric Society, vol. 74(1), pages 181-183, March.
    22. Dirk Depril & Iven Mechelen & Tom Wilderjans, 2012. "Lowdimensional Additive Overlapping Clustering," Journal of Classification, Springer;The Classification Society, vol. 29(3), pages 297-320, October.
    23. Yoshikazu Terada, 2014. "Strong Consistency of Reduced K-means Clustering," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(4), pages 913-931, December.
    24. repec:jss:jstsof:47:i05 is not listed on IDEAS
    25. Alessandro Casa & Giovanna Menardi, 2022. "Nonparametric semi-supervised classification with application to signal detection in high energy physics," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 31(3), pages 531-550, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:32:y:1983:i:3:p:267-275. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.