IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v110y2015i509p136-148.html
   My bibliography  Save this article

Robust Principal Component Analysis for Power Transformed Compositional Data

Author

Listed:
  • J. L. Scealy
  • Patrice de Caritat
  • Eric C. Grunsky
  • Michail T. Tsagris
  • A. H. Welsh

Abstract

Geochemical surveys collect sediment or rock samples, measure the concentration of chemical elements, and report these typically either in weight percent or in parts per million (ppm). There are usually a large number of elements measured and the distributions are often skewed, containing many potential outliers. We present a new robust principal component analysis (PCA) method for geochemical survey data, that involves first transforming the compositional data onto a manifold using a relative power transformation. A flexible set of moment assumptions are made which take the special geometry of the manifold into account. The Kent distribution moment structure arises as a special case when the chosen manifold is the hypersphere. We derive simple moment and robust estimators (RO) of the parameters which are also applicable in high-dimensional settings. The resulting PCA based on these estimators is done in the tangent space and is related to the power transformation method used in correspondence analysis. To illustrate, we analyze major oxide data from the National Geochemical Survey of Australia. When compared with the traditional approach in the literature based on the centered log-ratio transformation, the new PCA method is shown to be more successful at dimension reduction and gives interpretable results.

Suggested Citation

  • J. L. Scealy & Patrice de Caritat & Eric C. Grunsky & Michail T. Tsagris & A. H. Welsh, 2015. "Robust Principal Component Analysis for Power Transformed Compositional Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 136-148, March.
  • Handle: RePEc:taf:jnlasa:v:110:y:2015:i:509:p:136-148
    DOI: 10.1080/01621459.2014.990563
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2014.990563
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2014.990563?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. N. Locantore & J. Marron & D. Simpson & N. Tripoli & J. Zhang & K. Cohen & Graciela Boente & Ricardo Fraiman & Babette Brumback & Christophe Croux & Jianqing Fan & Alois Kneip & John Marden & Daniel P, 1999. "Robust principal component analysis for functional data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 8(1), pages 1-73, June.
    2. J. L. Scealy & A. H. Welsh, 2011. "Regression for compositional data by using distributions defined on the hypersphere," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 73(3), pages 351-375, June.
    3. Greenacre, Michael, 2009. "Power transformations in correspondence analysis," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3107-3116, June.
    4. Marden, John I., 1999. "Some robust estimates of principal components," Statistics & Probability Letters, Elsevier, vol. 43(4), pages 349-359, July.
    5. Taskinen, Sara & Koch, Inge & Oja, Hannu, 2012. "Robustifying principal component analysis with spatial sign vectors," Statistics & Probability Letters, Elsevier, vol. 82(4), pages 765-774.
    6. Michael Greenacre & Paul Lewi, 2009. "Distributional Equivalence and Subcompositional Coherence in the Analysis of Compositional Data, Contingency Tables and Ratio-Scale Measurements," Journal of Classification, Springer;The Classification Society, vol. 26(1), pages 29-54, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Marco Stefanucci & Stefano Mazzuco, 2022. "Analysing cause‐specific mortality trends using compositional functional data analysis," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(1), pages 61-83, January.
    2. Huiwen Wang & Zhichao Wang & Shanshan Wang, 2021. "Sliced inverse regression method for multivariate compositional data modeling," Statistical Papers, Springer, vol. 62(1), pages 361-393, February.
    3. Kokoszka, Piotr & Miao, Hong & Petersen, Alexander & Shang, Han Lin, 2019. "Forecasting of density functions with an application to cross-sectional and intraday returns," International Journal of Forecasting, Elsevier, vol. 35(4), pages 1304-1317.
    4. J. L. Scealy & A. H. Welsh, 2017. "A Directional Mixed Effects Model for Compositional Expenditure Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(517), pages 24-36, January.
    5. Liang, Wanfeng & Wu, Yue & Ma, Xiaoyan, 2022. "Robust sparse precision matrix estimation for high-dimensional compositional data," Statistics & Probability Letters, Elsevier, vol. 184(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Guangxing Wang & Sisheng Liu & Fang Han & Chong‐Zhi Di, 2023. "Robust functional principal component analysis via a functional pairwise spatial sign operator," Biometrics, The International Biometric Society, vol. 79(2), pages 1239-1253, June.
    2. Dürre, Alexander & Tyler, David E. & Vogel, Daniel, 2016. "On the eigenvalues of the spatial sign covariance matrix in more than two dimensions," Statistics & Probability Letters, Elsevier, vol. 111(C), pages 80-85.
    3. Majumdar, Subhabrata & Chatterjee, Snigdhansu, 2022. "On weighted multivariate sign functions," Journal of Multivariate Analysis, Elsevier, vol. 191(C).
    4. Raymaekers, Jakob & Rousseeuw, Peter, 2019. "A generalized spatial sign covariance matrix," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 94-111.
    5. Tsagris, Michail & Preston, Simon & T.A. Wood, Andrew, 2016. "Improved classi cation for compositional data using the $\alpha$-transformation," MPRA Paper 67657, University Library of Munich, Germany.
    6. C. Croux & C. Dehon & A. Yadine, 2010. "The k-step spatial sign covariance matrix," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 4(2), pages 137-150, September.
    7. Dürre, Alexander & Vogel, Daniel & Tyler, David E., 2014. "The spatial sign covariance matrix with unknown location," Journal of Multivariate Analysis, Elsevier, vol. 130(C), pages 107-117.
    8. Michail Tsagris & Simon Preston & Andrew T. A. Wood, 2016. "Improved Classification for Compositional Data Using the α-transformation," Journal of Classification, Springer;The Classification Society, vol. 33(2), pages 243-261, July.
    9. Seija Sirkiä & Sara Taskinen & Hannu Oja & David Tyler, 2009. "Tests and estimates of shape based on spatial signs and ranks," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 21(2), pages 155-176.
    10. Debruyne, Michiel & Hubert, Mia & Van Horebeek, Johan, 2010. "Detecting influential observations in Kernel PCA," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3007-3019, December.
    11. Pesonen, Maiju & Pesonen, Henri & Nevalainen, Jaakko, 2015. "Covariance matrix estimation for left-censored data," Computational Statistics & Data Analysis, Elsevier, vol. 92(C), pages 13-25.
    12. Xu, Yangchang & Xia, Ningning, 2023. "On the eigenvectors of large-dimensional sample spatial sign covariance matrices," Journal of Multivariate Analysis, Elsevier, vol. 193(C).
    13. Lombardo, Rosaria & Camminatiello, Ida & D'Ambra, Antonello & Beh, Eric J., 2021. "Assessing the Italian tax courts system by weighted three-way log-ratio analysis," Socio-Economic Planning Sciences, Elsevier, vol. 73(C).
    14. Michiel Debruyne & Tim Verdonck, 2010. "Robust kernel principal component analysis and classification," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 4(2), pages 151-167, September.
    15. Hervé Cardot & Antoine Godichon-Baggioni, 2017. "Fast estimation of the median covariation matrix with application to online robust principal components analysis," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 26(3), pages 461-480, September.
    16. Dürre, Alexander & Vogel, Daniel & Fried, Roland, 2015. "Spatial sign correlation," Journal of Multivariate Analysis, Elsevier, vol. 135(C), pages 89-105.
    17. Taskinen, Sara & Koch, Inge & Oja, Hannu, 2012. "Robustifying principal component analysis with spatial sign vectors," Statistics & Probability Letters, Elsevier, vol. 82(4), pages 765-774.
    18. Dürre, Alexander & Vogel, Daniel, 2016. "Asymptotics of the two-stage spatial sign correlation," Journal of Multivariate Analysis, Elsevier, vol. 144(C), pages 54-67.
    19. Xiongtao Dai & Zhenhua Lin & Hans‐Georg Müller, 2021. "Modeling sparse longitudinal data on Riemannian manifolds," Biometrics, The International Biometric Society, vol. 77(4), pages 1328-1341, December.
    20. Italo R. Lima & Guanqun Cao & Nedret Billor, 2019. "M-based simultaneous inference for the mean function of functional data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(3), pages 577-598, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:110:y:2015:i:509:p:136-148. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.