IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v32y2015i3p382-413.html
   My bibliography  Save this article

The Analysis of Multivariate Data Using Semi-Definite Programming

Author

Listed:
  • A.H. Al-Ibrahim

Abstract

A model is presented for analyzing general multivariate data. The model puts as its prime objective the dimensionality reduction of the multivariate problem. The only requirement of the model is that the input data to the statistical analysis be a covariance matrix, a correlation matrix, or more generally a positive semi-definite matrix. The model is parameterized by a scale parameter and a shape parameter both of which take on non-negative values smaller than unity. We first prove a wellknown heuristic for minimizing rank and establish the conditions under which rank can be replaced with trace. This result allows us to solve our rank minimization problem as a Semi-Definite Programming (SDP) problem by a number of available solvers. We then apply the model to four case studies dealing with four well-known problems in multivariate analysis. The first problem is to determine the number of underlying factors in factor analysis (FA) or the number of retained components in principal component analysis (PCA). It is shown that our model determines the number of factors or components more efficiently than the commonly used methods. The second example deals with a problem that has received much attention in recent years due to its wide applications, and it concerns sparse principal components and variable selection in PCA. When applied to a data set known in the literature as the pitprop data, we see that our approach yields PCs with larger variances than PCs derived from other approaches. The third problem concerns sensitivity analysis of the multivariate models, a topic not widely researched in the sequel due to its difficulty. Finally, we apply the model to a difficult problem in PCA known as lack of scale invariance in the solutions of PCA. This is the problem that the solutions derived from analyzing the covariance matrix in PCA are generally different (and not linearly related to) the solutions derived from analyzing the correlation matrix. Using our model, we obtain the same solution whether we analyze the correlation matrix or the covariance matrix since the analysis utilizes only the signs of the correlations/covariances but not their values. This is where we introduce a new type of PCA, called Sign PCA, which we speculate on its applications in social sciences and other fields of science. Copyright Classification Society of North America 2015

Suggested Citation

  • A.H. Al-Ibrahim, 2015. "The Analysis of Multivariate Data Using Semi-Definite Programming," Journal of Classification, Springer;The Classification Society, vol. 32(3), pages 382-413, October.
  • Handle: RePEc:spr:jclass:v:32:y:2015:i:3:p:382-413
    DOI: 10.1007/s00357-015-9184-0
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s00357-015-9184-0
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s00357-015-9184-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Gale Young & A. Householder, 1938. "Discussion of a set of points in terms of their mutual distances," Psychometrika, Springer;The Psychometric Society, vol. 3(1), pages 19-22, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. A. H. Al-Ibrahim, 2019. "A Framework for Quantifying Qualitative Responses in Pairwise Experiments," Journal of Classification, Springer;The Classification Society, vol. 36(3), pages 471-492, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fengzhen Zhai & Qingna Li, 2020. "A Euclidean distance matrix model for protein molecular conformation," Journal of Global Optimization, Springer, vol. 76(4), pages 709-728, April.
    2. Panpan Yu & Qingna Li, 2018. "Ordinal Distance Metric Learning with MDS for Image Ranking," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 35(01), pages 1-19, February.
    3. Chao Ding & Hou-Duo Qi, 2017. "Convex Euclidean distance embedding for collaborative position localization with NLOS mitigation," Computational Optimization and Applications, Springer, vol. 66(1), pages 187-218, January.
    4. Zha, Hongyuan & Zhang, Zhenyue, 2007. "Continuum Isomap for manifold learnings," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 184-200, September.
    5. Si-Tong Lu & Miao Zhang & Qing-Na Li, 2020. "Feasibility and a fast algorithm for Euclidean distance matrix optimization with ordinal constraints," Computational Optimization and Applications, Springer, vol. 76(2), pages 535-569, June.
    6. J. Carroll, 1985. "Review," Psychometrika, Springer;The Psychometric Society, vol. 50(1), pages 133-140, March.
    7. Małgorzata Poniatowska-Jaksch, 2021. "Energy Consumption in Central and Eastern Europe (CEE) Households in the Platform Economics," Energies, MDPI, vol. 14(4), pages 1-22, February.
    8. Francis Caillez & Pascale Kuntz, 1996. "A contribution to the study of the metric and Euclidean structures of dissimilarities," Psychometrika, Springer;The Psychometric Society, vol. 61(2), pages 241-253, June.
    9. Malone, Samuel W. & Tarazaga, Pablo & Trosset, Michael W., 2002. "Better initial configurations for metric multidimensional scaling," Computational Statistics & Data Analysis, Elsevier, vol. 41(1), pages 143-156, November.
    10. Aurea Grané & Rosario Romera, 2018. "On Visualizing Mixed-Type Data," Sociological Methods & Research, , vol. 47(2), pages 207-239, March.
    11. Joseph Zinnes & David MacKay, 1983. "Probabilistic multidimensional scaling: Complete and incomplete data," Psychometrika, Springer;The Psychometric Society, vol. 48(1), pages 27-48, March.
    12. Samuel Messick, 1956. "An empirical evaluation of multidimensional successive intervals," Psychometrika, Springer;The Psychometric Society, vol. 21(4), pages 367-375, December.
    13. Michael W. Trosset, 2002. "Extensions of Classical Multidimensional Scaling via Variable Reduction," Computational Statistics, Springer, vol. 17(2), pages 147-163, July.
    14. Qian Zhang & Xinyuan Zhao & Chao Ding, 2021. "Matrix optimization based Euclidean embedding with outliers," Computational Optimization and Applications, Springer, vol. 79(2), pages 235-271, June.
    15. Sheng-Shiung Wu & Sing-Jie Jong & Kai Hu & Jiann-Ming Wu, 2021. "Learning Neural Representations and Local Embedding for Nonlinear Dimensionality Reduction Mapping," Mathematics, MDPI, vol. 9(9), pages 1-18, April.
    16. Kuroda, Kaori & Hashiguchi, Hiroki & Fujiwara, Kantaro & Ikeguchi, Tohru, 2014. "Reconstruction of network structures from marked point processes using multi-dimensional scaling," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 415(C), pages 194-204.
    17. François Bavaud, 2011. "On the Schoenberg Transformations in Data Analysis: Theory and Illustrations," Journal of Classification, Springer;The Classification Society, vol. 28(3), pages 297-314, October.
    18. W. Alan Nicewander & Joseph Lee Rodgers, 2022. "Obituary: Bruce McArthur Bloxom 1938–2020," Psychometrika, Springer;The Psychometric Society, vol. 87(3), pages 1042-1044, September.
    19. Cornelius Fritz & Göran Kauermann, 2022. "On the interplay of regional mobility, social connectedness and the spread of COVID‐19 in Germany," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(1), pages 400-424, January.
    20. Maarten M. Kampert & Jacqueline J. Meulman & Jerome H. Friedman, 2017. "rCOSA: A Software Package for Clustering Objects on Subsets of Attributes," Journal of Classification, Springer;The Classification Society, vol. 34(3), pages 514-547, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:32:y:2015:i:3:p:382-413. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.