IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v15y2021i4d10.1007_s11634-021-00436-9.html
   My bibliography  Save this article

Robust regression with compositional covariates including cellwise outliers

Author

Listed:
  • Nikola Štefelová

    (Palacký University)

  • Andreas Alfons

    (Erasmus Universiteit Rotterdam)

  • Javier Palarea-Albaladejo

    (Biomathematics and Statistics Scotland, JCMB)

  • Peter Filzmoser

    (Vienna University of Technology)

  • Karel Hron

    (Palacký University)

Abstract

We propose a robust procedure to estimate a linear regression model with compositional and real-valued explanatory variables. The proposed procedure is designed to be robust against individual outlying cells in the data matrix (cellwise outliers), as well as entire outlying observations (rowwise outliers). Cellwise outliers are first filtered and then imputed by robust estimates. Afterwards, rowwise robust compositional regression is performed to obtain model coefficient estimates. Simulations show that the procedure generally outperforms a traditional rowwise-only robust regression method (MM-estimator). Moreover, our procedure yields better or comparable results to recently proposed cellwise robust regression methods (shooting S-estimator, 3-step regression) while it is preferable for interpretation through the use of appropriate coordinate systems for compositional data. An application to bio-environmental data reveals that the proposed procedure—compared to other regression methods—leads to conclusions that are best aligned with established scientific knowledge.

Suggested Citation

  • Nikola Štefelová & Andreas Alfons & Javier Palarea-Albaladejo & Peter Filzmoser & Karel Hron, 2021. "Robust regression with compositional covariates including cellwise outliers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 869-909, December.
  • Handle: RePEc:spr:advdac:v:15:y:2021:i:4:d:10.1007_s11634-021-00436-9
    DOI: 10.1007/s11634-021-00436-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-021-00436-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-021-00436-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Mike Danilov & Víctor J. Yohai & Ruben H. Zamar, 2012. "Robust Estimation of Multivariate Location and Scatter in the Presence of Missing Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 1178-1186, September.
    2. Claudio Agostinelli & Andy Leung & Victor Yohai & Ruben Zamar, 2015. "Rejoinder on: Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(3), pages 484-488, September.
    3. K. Hron & P. Filzmoser & K. Thompson, 2012. "Linear regression with compositional explanatory variables," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(5), pages 1115-1128, November.
    4. Hron, K. & Templ, M. & Filzmoser, P., 2010. "Imputation of missing values for compositional data using classical and robust methods," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3095-3107, December.
    5. Filzmoser, P. & Höppner, S. & Ortner, I. & Serneels, S. & Verdonck, T., 2020. "Cellwise robust M regression," Computational Statistics & Data Analysis, Elsevier, vol. 147(C).
    6. Leung, Andy & Zhang, Hongyang & Zamar, Ruben, 2016. "Robust regression estimation and inference in the presence of cellwise and casewise contamination," Computational Statistics & Data Analysis, Elsevier, vol. 99(C), pages 1-11.
    7. Leung, Andy & Yohai, Victor & Zamar, Ruben, 2017. "Multivariate location and scatter matrix estimation under cellwise and casewise contamination," Computational Statistics & Data Analysis, Elsevier, vol. 111(C), pages 59-76.
    8. Claudio Agostinelli & Andy Leung & Victor Yohai & Ruben Zamar, 2015. "Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(3), pages 441-461, September.
    9. Templ, Matthias & Kowarik, Alexander & Filzmoser, Peter, 2011. "Iterative stepwise regression imputation using standard and robust methods," Computational Statistics & Data Analysis, Elsevier, vol. 55(10), pages 2793-2806, October.
    10. Khan, Jafar A. & Van Aelst, Stefan & Zamar, Ruben H., 2007. "Robust Linear Model Selection Based on Least Angle Regression," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1289-1299, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yashon O. Ouma & Ditiro B. Moalafhi & George Anderson & Boipuso Nkwae & Phillimon Odirile & Bhagabat P. Parida & Jiaguo Qi, 2022. "Dam Water Level Prediction Using Vector AutoRegression, Random Forest Regression and MLP-ANN Models Based on Land-Use and Climate Factors," Sustainability, MDPI, vol. 14(22), pages 1-31, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Stephane Heritier & Maria-Pia Victoria-Feser, 2018. "Discussion of “The power of monitoring: how to make the most of a contaminated multivariate sample” by Andrea Cerioli, Marco Riani, Anthony C. Atkinson and Aldo Corbellini," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(4), pages 595-602, December.
    2. Stefan Aelst & Ruben H. Zamar, 2019. "Comments on: Data science, big data and statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(2), pages 360-362, June.
    3. Christophe Croux & Viktoria Öllerer, 2015. "Comments on: Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(3), pages 462-466, September.
    4. Frahm, Gabriel & Nordhausen, Klaus & Oja, Hannu, 2020. "M-estimation with incomplete and dependent multivariate data," Journal of Multivariate Analysis, Elsevier, vol. 176(C).
    5. Leung, Andy & Yohai, Victor & Zamar, Ruben, 2017. "Multivariate location and scatter matrix estimation under cellwise and casewise contamination," Computational Statistics & Data Analysis, Elsevier, vol. 111(C), pages 59-76.
    6. Henry Velasco & Henry Laniado & Mauricio Toro & Víctor Leiva & Yuhlong Lio, 2020. "Robust Three-Step Regression Based on Comedian and Its Performance in Cell-Wise and Case-Wise Outliers," Mathematics, MDPI, vol. 8(8), pages 1-18, August.
    7. Leung, Andy & Zhang, Hongyang & Zamar, Ruben, 2016. "Robust regression estimation and inference in the presence of cellwise and casewise contamination," Computational Statistics & Data Analysis, Elsevier, vol. 99(C), pages 1-11.
    8. Giovanni Saraceno & Claudio Agostinelli, 2021. "Robust multivariate estimation based on statistical depth filters," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(4), pages 935-959, December.
    9. Giovanni Saraceno & Claudio Agostinelli & Luca Greco, 2021. "Robust estimation for multivariate wrapped models," METRON, Springer;Sapienza Università di Roma, vol. 79(2), pages 225-240, August.
    10. Jan Kalina & Jan Tichavský, 2022. "The minimum weighted covariance determinant estimator for high-dimensional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(4), pages 977-999, December.
    11. Maronna, Ricardo A. & Yohai, Victor J., 2017. "Robust and efficient estimation of multivariate scatter and location," Computational Statistics & Data Analysis, Elsevier, vol. 109(C), pages 64-75.
    12. Archimbaud, Aurore & Nordhausen, Klaus & Ruiz-Gazen, Anne, 2018. "ICS for multivariate outlier detection with application to quality control," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 184-199.
    13. Anahita Nodehi & Mousa Golalizadeh & Mehdi Maadooliat & Claudio Agostinelli, 2021. "Estimation of parameters in multivariate wrapped models for data on a p-torus," Computational Statistics, Springer, vol. 36(1), pages 193-215, March.
    14. David J. Hand, 2018. "Statistical challenges of administrative and transaction data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(3), pages 555-605, June.
    15. Bottmer, Lea & Croux, Christophe & Wilms, Ines, 2022. "Sparse regression for large data sets with outliers," European Journal of Operational Research, Elsevier, vol. 297(2), pages 782-794.
    16. Rousseeuw, Peter & Perrotta, Domenico & Riani, Marco & Hubert, Mia, 2019. "Robust Monitoring of Time Series with Application to Fraud Detection," Econometrics and Statistics, Elsevier, vol. 9(C), pages 108-121.
    17. Lafit, Ginette & Nogales Martín, Francisco Javier, 2017. "Robust and sparse estimation of high-dimensional precision matrices via bivariate outlier detection," DES - Working Papers. Statistics and Econometrics. WS 24534, Universidad Carlos III de Madrid. Departamento de Estadística.
    18. Md. Matiur Rahaman & Md. Nurul Haque Mollah, 2019. "Robustification of Gaussian Bayes Classifier by the Minimum β-Divergence Method," Journal of Classification, Springer;The Classification Society, vol. 36(1), pages 113-139, April.
    19. Mauricio Velasquez, 2016. "Compositions vs Gini: A new metric to evaluate the effects of land-income disparities," 2016 Papers pve364, Job Market Papers.
    20. Janina Janurek & Sascha Abdel Hadi & Andreas Mojzisch & Jan Alexander Häusser, 2018. "The Association of the 24 Hour Distribution of Time Spent in Physical Activity, Work, and Sleep with Emotional Exhaustion," IJERPH, MDPI, vol. 15(9), pages 1-14, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:15:y:2021:i:4:d:10.1007_s11634-021-00436-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.