IDEAS home Printed from https://ideas.repec.org/p/upf/upfgen/1554.html
   My bibliography  Save this paper

Towards a pragmatic approach to compositional data analysis

Author

Abstract

Compositional data are nonnegative data with the property of closure: that is, each set of values on their components, or so-called parts, has a fixed sum, usually 1 or 100%. The approach to compositional data analysis originated by John Aitchison uses ratios of parts as the fundamental starting point for description and modeling. I show that a compositional data set can be effectively replaced by a set of ratios, one less than the number of parts, and that these ratios describe an acyclic connected graph of all the parts. Contrary to recent literature, I show that the additive log-ratio transformation can be an excellent substitute for the original data set, as shown in an archaeological data set as well as in three other examples. I propose further that a smaller set of ratios of parts can be determined, either by expert choice or by automatic selection, which explains as much variance as required for all practical purposes. These part ratios can then be validly summarized and analyzed by conventional univariate methods, as well as multivariate methods, where the ratios are preferably log-transformed.

Suggested Citation

  • Michael Greenacre, 2017. "Towards a pragmatic approach to compositional data analysis," Economics Working Papers 1554, Department of Economics and Business, Universitat Pompeu Fabra.
  • Handle: RePEc:upf:upfgen:1554
    as

    Download full text from publisher

    File URL: https://econ-papers.upf.edu/papers/1554.pdf
    File Function: Whole Paper
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Arnold Wollenberg, 1977. "Redundancy analysis an alternative for canonical correlation analysis," Psychometrika, Springer;The Psychometric Society, vol. 42(2), pages 207-219, June.
    2. Luc Wouters & Hinrich W. Göhlmann & Luc Bijnens & Stefan U. Kass & Geert Molenberghs & Paul J. Lewi, 2003. "Graphical Exploration of Gene Expression Data: A Comparative Study of Three Multivariate Methods," Biometrics, The International Biometric Society, vol. 59(4), pages 1131-1139, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yvonne Pittelkow & Susan R. Wilson, 2005. "Use of Principal Component Analysis and the GE-Biplot for the Graphical Exploration of Gene Expression Data," Biometrics, The International Biometric Society, vol. 61(2), pages 630-632, June.
    2. Cook, Judith A. & Razzano, Lisa & Cappelleri, Joseph C., 1996. "Canonical correlation analysis of residential and vocational outcomes following psychiatric rehabilitation," Evaluation and Program Planning, Elsevier, vol. 19(4), pages 351-363, November.
    3. Kargin, V. & Onatski, A., 2008. "Curve forecasting by functional autoregression," Journal of Multivariate Analysis, Elsevier, vol. 99(10), pages 2508-2526, November.
    4. Minjung Kyung & Ju-Hyun Park & Ji Yeh Choi, 2022. "Bayesian Mixture Model of Extended Redundancy Analysis," Psychometrika, Springer;The Psychometric Society, vol. 87(3), pages 946-966, September.
    5. Miaomiao Yang & Keli Zhang & Chenlu Huang & Qinke Yang, 2022. "Effects of Content of Soil Rock Fragments on Soil Erodibility in China," IJERPH, MDPI, vol. 19(2), pages 1-19, January.
    6. Abby Israëls, 1986. "Reviews," Psychometrika, Springer;The Psychometric Society, vol. 51(3), pages 495-497, September.
    7. Takane, Yoshio & Jung, Sunho, 2009. "Regularized nonsymmetric correspondence analysis," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3159-3170, June.
    8. John Zilvinskis & Anthony A. Masseria & Gary R. Pike, 2017. "Student Engagement and Student Learning: Examining the Convergent and Discriminant Validity of the Revised National Survey of Student Engagement," Research in Higher Education, Springer;Association for Institutional Research, vol. 58(8), pages 880-903, December.
    9. Lazraq, Aziz & Cléroux, Robert, 2001. "Statistical Inference Concerning Several Redundancy Indices," Journal of Multivariate Analysis, Elsevier, vol. 79(1), pages 71-88, October.
    10. Yvonne Milker & Manuel F G Weinkauf & Jürgen Titschack & Andre Freiwald & Stefan Krüger & Frans J Jorissen & Gerhard Schmiedl, 2017. "Testing the applicability of a benthic foraminiferal-based transfer function for the reconstruction of paleowater depth changes in Rhodes (Greece) during the early Pleistocene," PLOS ONE, Public Library of Science, vol. 12(11), pages 1-30, November.
    11. Cuadras, Carles M. & Greenacre, Michael, 2022. "A short history of statistical association: From correlation to correspondence analysis to copulas," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    12. Vittadini, Giorgio & Minotti, Simona C. & Fattore, Marco & Lovaglio, Pietro G., 2007. "On the relationships among latent variables and residuals in PLS path modeling: The formative-reflective scheme," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 5828-5846, August.
    13. Maria Iannario & Rosaria Romano & Domenico Vistocco, 2023. "Dyadic analysis for multi-block data in sport surveys analytics," Annals of Operations Research, Springer, vol. 325(1), pages 701-714, June.
    14. Pietro Lovaglio & Giorgio Vittadini, 2013. "Multilevel dimensionality-reduction methods," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 22(2), pages 183-207, June.
    15. Michael Greenacre & Paul Lewi, 2009. "Distributional Equivalence and Subcompositional Coherence in the Analysis of Compositional Data, Contingency Tables and Ratio-Scale Measurements," Journal of Classification, Springer;The Classification Society, vol. 26(1), pages 29-54, April.
    16. Yupeng Sang & Zhiqin Long & Xuming Dan & Jiajun Feng & Tingting Shi & Changfu Jia & Xinxin Zhang & Qiang Lai & Guanglei Yang & Hongying Zhang & Xiaoting Xu & Huanhuan Liu & Yuanzhong Jiang & Pär K. In, 2022. "Genomic insights into local adaptation and future climate-induced vulnerability of a keystone forest tree in East Asia," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    17. Bougeard, S. & Hanafi, M. & Qannari, E.M., 2008. "Continuum redundancy-PLS regression: A simple continuum approach," Computational Statistics & Data Analysis, Elsevier, vol. 52(7), pages 3686-3696, March.
    18. Jan Graffelman, 2005. "Enriched biplots for canonical correlation analysis," Journal of Applied Statistics, Taylor & Francis Journals, vol. 32(2), pages 173-188.
    19. Lovaglio, Pietro Giorgio & Vacca, Gianmarco, 2016. "%ERA: A SAS Macro for Extended Redundancy Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(c01).
    20. Martínez-Ventura, Constanza & Mariño-Martínez, Ricardo & Miguélez-Márquez, Javier, 2023. "Redundancy of Centrality Measures in Financial Market Infrastructures," Latin American Journal of Central Banking (previously Monetaria), Elsevier, vol. 4(4).

    More about this item

    Keywords

    compositional data; log-ratio transformation; log-ratio analysis; log-ratio distance; multivariate analysis; ratios; subcompositional coherence; univariate statistics.;
    All these keywords.

    JEL classification:

    • Z32 - Other Special Topics - - Tourism Economics - - - Tourism and Development
    • C19 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Other
    • C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:upf:upfgen:1554. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: http://www.econ.upf.edu/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.