IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v32y2015i3p382-413.html
   My bibliography  Save this article

The Analysis of Multivariate Data Using Semi-Definite Programming

Author

Listed:
  • A.H. Al-Ibrahim

Abstract

A model is presented for analyzing general multivariate data. The model puts as its prime objective the dimensionality reduction of the multivariate problem. The only requirement of the model is that the input data to the statistical analysis be a covariance matrix, a correlation matrix, or more generally a positive semi-definite matrix. The model is parameterized by a scale parameter and a shape parameter both of which take on non-negative values smaller than unity. We first prove a wellknown heuristic for minimizing rank and establish the conditions under which rank can be replaced with trace. This result allows us to solve our rank minimization problem as a Semi-Definite Programming (SDP) problem by a number of available solvers. We then apply the model to four case studies dealing with four well-known problems in multivariate analysis. The first problem is to determine the number of underlying factors in factor analysis (FA) or the number of retained components in principal component analysis (PCA). It is shown that our model determines the number of factors or components more efficiently than the commonly used methods. The second example deals with a problem that has received much attention in recent years due to its wide applications, and it concerns sparse principal components and variable selection in PCA. When applied to a data set known in the literature as the pitprop data, we see that our approach yields PCs with larger variances than PCs derived from other approaches. The third problem concerns sensitivity analysis of the multivariate models, a topic not widely researched in the sequel due to its difficulty. Finally, we apply the model to a difficult problem in PCA known as lack of scale invariance in the solutions of PCA. This is the problem that the solutions derived from analyzing the covariance matrix in PCA are generally different (and not linearly related to) the solutions derived from analyzing the correlation matrix. Using our model, we obtain the same solution whether we analyze the correlation matrix or the covariance matrix since the analysis utilizes only the signs of the correlations/covariances but not their values. This is where we introduce a new type of PCA, called Sign PCA, which we speculate on its applications in social sciences and other fields of science. Copyright Classification Society of North America 2015

Suggested Citation

  • A.H. Al-Ibrahim, 2015. "The Analysis of Multivariate Data Using Semi-Definite Programming," Journal of Classification, Springer;The Classification Society, vol. 32(3), pages 382-413, October.
  • Handle: RePEc:spr:jclass:v:32:y:2015:i:3:p:382-413
    DOI: 10.1007/s00357-015-9184-0
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s00357-015-9184-0
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s00357-015-9184-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Gale Young & A. Householder, 1938. "Discussion of a set of points in terms of their mutual distances," Psychometrika, Springer;The Psychometric Society, vol. 3(1), pages 19-22, March.
    2. Gábor Pataki, 1998. "On the Rank of Extreme Matrices in Semidefinite Programs and the Multiplicity of Optimal Eigenvalues," Mathematics of Operations Research, INFORMS, vol. 23(2), pages 339-358, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. A. H. Al-Ibrahim, 2019. "A Framework for Quantifying Qualitative Responses in Pairwise Experiments," Journal of Classification, Springer;The Classification Society, vol. 36(3), pages 471-492, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fengzhen Zhai & Qingna Li, 2020. "A Euclidean distance matrix model for protein molecular conformation," Journal of Global Optimization, Springer, vol. 76(4), pages 709-728, April.
    2. Xinzhen Zhang & Chen Ling & Liqun Qi, 2011. "Semidefinite relaxation bounds for bi-quadratic optimization problems with quadratic constraints," Journal of Global Optimization, Springer, vol. 49(2), pages 293-311, February.
    3. Panpan Yu & Qingna Li, 2018. "Ordinal Distance Metric Learning with MDS for Image Ranking," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 35(01), pages 1-19, February.
    4. Bomze, Immanuel M. & Gabl, Markus, 2023. "Optimization under uncertainty and risk: Quadratic and copositive approaches," European Journal of Operational Research, Elsevier, vol. 310(2), pages 449-476.
    5. Chao Ding & Hou-Duo Qi, 2017. "Convex Euclidean distance embedding for collaborative position localization with NLOS mitigation," Computational Optimization and Applications, Springer, vol. 66(1), pages 187-218, January.
    6. Stefan Sremac & Fei Wang & Henry Wolkowicz & Lucas Pettersson, 2019. "Noisy Euclidean distance matrix completion with a single missing node," Journal of Global Optimization, Springer, vol. 75(4), pages 973-1002, December.
    7. Madeleine Udell & Stephen Boyd, 2016. "Bounding duality gap for separable problems with linear constraints," Computational Optimization and Applications, Springer, vol. 64(2), pages 355-378, June.
    8. Zha, Hongyuan & Zhang, Zhenyue, 2007. "Continuum Isomap for manifold learnings," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 184-200, September.
    9. Anthony Man-Cho So & Yinyu Ye & Jiawei Zhang, 2008. "A Unified Theorem on SDP Rank Reduction," Mathematics of Operations Research, INFORMS, vol. 33(4), pages 910-920, November.
    10. Immanuel Bomze & Markus Gabl, 2021. "Interplay of non-convex quadratically constrained problems with adjustable robust optimization," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 93(1), pages 115-151, February.
    11. Si-Tong Lu & Miao Zhang & Qing-Na Li, 2020. "Feasibility and a fast algorithm for Euclidean distance matrix optimization with ordinal constraints," Computational Optimization and Applications, Springer, vol. 76(2), pages 535-569, June.
    12. J. Carroll, 1985. "Review," Psychometrika, Springer;The Psychometric Society, vol. 50(1), pages 133-140, March.
    13. Jos F. Sturm & Shuzhong Zhang, 2003. "On Cones of Nonnegative Quadratic Functions," Mathematics of Operations Research, INFORMS, vol. 28(2), pages 246-267, May.
    14. Małgorzata Poniatowska-Jaksch, 2021. "Energy Consumption in Central and Eastern Europe (CEE) Households in the Platform Economics," Energies, MDPI, vol. 14(4), pages 1-22, February.
    15. Im, Haesol & Wolkowicz, Henry, 2023. "Revisiting degeneracy, strict feasibility, stability, in linear programming," European Journal of Operational Research, Elsevier, vol. 310(2), pages 495-510.
    16. Boris Defourny & Ilya O. Ryzhov & Warren B. Powell, 2015. "Optimal Information Blending with Measurements in the L 2 Sphere," Mathematics of Operations Research, INFORMS, vol. 40(4), pages 1060-1088, October.
    17. Francis Caillez & Pascale Kuntz, 1996. "A contribution to the study of the metric and Euclidean structures of dissimilarities," Psychometrika, Springer;The Psychometric Society, vol. 61(2), pages 241-253, June.
    18. Malone, Samuel W. & Tarazaga, Pablo & Trosset, Michael W., 2002. "Better initial configurations for metric multidimensional scaling," Computational Statistics & Data Analysis, Elsevier, vol. 41(1), pages 143-156, November.
    19. Aurea Grané & Rosario Romera, 2018. "On Visualizing Mixed-Type Data," Sociological Methods & Research, , vol. 47(2), pages 207-239, March.
    20. Joseph Zinnes & David MacKay, 1983. "Probabilistic multidimensional scaling: Complete and incomplete data," Psychometrika, Springer;The Psychometric Society, vol. 48(1), pages 27-48, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:32:y:2015:i:3:p:382-413. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.