IDEAS home Printed from https://ideas.repec.org/p/upf/upfgen/1554.html
   My bibliography  Save this paper

Towards a pragmatic approach to compositional data analysis

Author

Abstract

Compositional data are nonnegative data with the property of closure: that is, each set of values on their components, or so-called parts, has a fixed sum, usually 1 or 100%. The approach to compositional data analysis originated by John Aitchison uses ratios of parts as the fundamental starting point for description and modeling. I show that a compositional data set can be effectively replaced by a set of ratios, one less than the number of parts, and that these ratios describe an acyclic connected graph of all the parts. Contrary to recent literature, I show that the additive log-ratio transformation can be an excellent substitute for the original data set, as shown in an archaeological data set as well as in three other examples. I propose further that a smaller set of ratios of parts can be determined, either by expert choice or by automatic selection, which explains as much variance as required for all practical purposes. These part ratios can then be validly summarized and analyzed by conventional univariate methods, as well as multivariate methods, where the ratios are preferably log-transformed.

Suggested Citation

  • Michael Greenacre, 2017. "Towards a pragmatic approach to compositional data analysis," Economics Working Papers 1554, Department of Economics and Business, Universitat Pompeu Fabra.
  • Handle: RePEc:upf:upfgen:1554
    as

    Download full text from publisher

    File URL: https://econ-papers.upf.edu/papers/1554.pdf
    File Function: Whole Paper
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Arnold Wollenberg, 1977. "Redundancy analysis an alternative for canonical correlation analysis," Psychometrika, Springer;The Psychometric Society, vol. 42(2), pages 207-219, June.
    2. Luc Wouters & Hinrich W. Göhlmann & Luc Bijnens & Stefan U. Kass & Geert Molenberghs & Paul J. Lewi, 2003. "Graphical Exploration of Gene Expression Data: A Comparative Study of Three Multivariate Methods," Biometrics, The International Biometric Society, vol. 59(4), pages 1131-1139, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yvonne Pittelkow & Susan R. Wilson, 2005. "Use of Principal Component Analysis and the GE-Biplot for the Graphical Exploration of Gene Expression Data," Biometrics, The International Biometric Society, vol. 61(2), pages 630-632, June.
    2. Vittadini, Giorgio & Minotti, Simona C. & Fattore, Marco & Lovaglio, Pietro G., 2007. "On the relationships among latent variables and residuals in PLS path modeling: The formative-reflective scheme," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 5828-5846, August.
    3. Michael Greenacre, 2016. "Selection and statistical analysis of compositional ratios," Economics Working Papers 1551, Department of Economics and Business, Universitat Pompeu Fabra.
    4. Hanafi, Mohamed & Kiers, Henk A.L., 2006. "Analysis of K sets of data, with differential emphasis on agreement between and within sets," Computational Statistics & Data Analysis, Elsevier, vol. 51(3), pages 1491-1508, December.
    5. Kazi-Aoual, Frederique & Hitier, Simon & Sabatier, Robert & Lebreton, Jean-Dominique, 1995. "Refined approximations to permutation tests for multivariate inference," Computational Statistics & Data Analysis, Elsevier, vol. 20(6), pages 643-656, December.
    6. Cook, Judith A. & Razzano, Lisa & Cappelleri, Joseph C., 1996. "Canonical correlation analysis of residential and vocational outcomes following psychiatric rehabilitation," Evaluation and Program Planning, Elsevier, vol. 19(4), pages 351-363, November.
    7. Kargin, V. & Onatski, A., 2008. "Curve forecasting by functional autoregression," Journal of Multivariate Analysis, Elsevier, vol. 99(10), pages 2508-2526, November.
    8. Minjung Kyung & Ju-Hyun Park & Ji Yeh Choi, 2022. "Bayesian Mixture Model of Extended Redundancy Analysis," Psychometrika, Springer;The Psychometric Society, vol. 87(3), pages 946-966, September.
    9. Miaomiao Yang & Keli Zhang & Chenlu Huang & Qinke Yang, 2022. "Effects of Content of Soil Rock Fragments on Soil Erodibility in China," IJERPH, MDPI, vol. 19(2), pages 1-19, January.
    10. Wayne DeSarbo & Heungsun Hwang & Ashley Stadler Blank & Eelco Kappe, 2015. "Constrained Stochastic Extended Redundancy Analysis," Psychometrika, Springer;The Psychometric Society, vol. 80(2), pages 516-534, June.
    11. Abby Israëls, 1986. "Reviews," Psychometrika, Springer;The Psychometric Society, vol. 51(3), pages 495-497, September.
    12. Takane, Yoshio & Jung, Sunho, 2009. "Regularized nonsymmetric correspondence analysis," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3159-3170, June.
    13. Martínez-Ventura, Constanza & Mariño-Martínez, Ricardo & Miguélez-Márquez, Javier, 2023. "Redundancy of Centrality Measures in Financial Market Infrastructures," Latin American Journal of Central Banking (previously Monetaria), Elsevier, vol. 4(4).
    14. John Zilvinskis & Anthony A. Masseria & Gary R. Pike, 2017. "Student Engagement and Student Learning: Examining the Convergent and Discriminant Validity of the Revised National Survey of Student Engagement," Research in Higher Education, Springer;Association for Institutional Research, vol. 58(8), pages 880-903, December.
    15. Ariyawardana, A. & Bailey, W.C, 2002. "The Relationship between Core Resources and Strategies of Firms: The Case of Sri Lankan Value-Added Tea Producers," Sri Lankan Journal of Agricultural Economics, Sri Lanka Agricultural Economics Association (SAEA), vol. 4, pages 1-19.
    16. Lazraq, Aziz & Cléroux, Robert, 2001. "Statistical Inference Concerning Several Redundancy Indices," Journal of Multivariate Analysis, Elsevier, vol. 79(1), pages 71-88, October.
    17. Dray, Stephane, 2008. "On the number of principal components: A test of dimensionality based on measurements of similarity between matrices," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 2228-2237, January.
    18. Jos Berge, 1985. "On the relationship between Fortier's simultaneous linear prediction and van den Wollenberg's redundancy analysis," Psychometrika, Springer;The Psychometric Society, vol. 50(1), pages 121-122, March.
    19. Susana Mendes & M. Jos� Fernández-Gómez & Mário Jorge Pereira & Ulisses Miranda Azeiteiro & M. Purificación Galindo-Villardón, 2012. "An empirical comparison of Canonical Correspondence Analysis and STATICO in the identification of spatio-temporal ecological relationships," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(5), pages 979-994, October.
    20. Michel Tenenhaus & Arthur Tenenhaus & Patrick J. F. Groenen, 2017. "Regularized Generalized Canonical Correlation Analysis: A Framework for Sequential Multiblock Component Methods," Psychometrika, Springer;The Psychometric Society, vol. 82(3), pages 737-777, September.

    More about this item

    Keywords

    compositional data; log-ratio transformation; log-ratio analysis; log-ratio distance; multivariate analysis; ratios; subcompositional coherence; univariate statistics.;
    All these keywords.

    JEL classification:

    • Z32 - Other Special Topics - - Tourism Economics - - - Tourism and Development
    • C19 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Other
    • C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:upf:upfgen:1554. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: http://www.econ.upf.edu/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.