IDEAS home Printed from https://ideas.repec.org/a/spr/sankha/v80y2018i1d10.1007_s13171-017-0102-x.html
   My bibliography  Save this article

The Effect of Data Contamination in Sliced Inverse Regression and Finite Sample Breakdown Point

Author

Listed:
  • Ulrike Genschel

    (Iowa State University)

Abstract

Dimension reduction procedures have received increasing consideration over the past decades. Despite this attention, the effect of data contamination or outlying data points in dimension reduction is, however, not well understood, and is compounded by the issue that outliers can be difficult to classify in the presence of many variables. This paper formally investigates the influence of data contamination for sliced inverse regression (SIR), which is a prototypical dimension reduction procedure that targets a lower-dimensional subspace of a set of regressors needed to explain a response variable. We establish a general theory for how estimated reduction subspaces can be distorted through both the number and direction of outlying data points. The results depend critically on the regressor covariance structure and the most harmful types of data contamination are shown to differ in cases where this covariance structure is known or unknown. For example, if the covariance structure is estimated, data contamination is proven to produce an estimated subspace that is automatically orthogonal to the directions of outlying data points, constituting a potentially serious loss of information. Our main results demonstrate the degree to which data contamination indeed causes incorrect dimension reduction, depending on the amount, magnitude, and direction of contamination. Further, by metricizing distances between dimension reduction subspaces, worst case results for data contamination can be formulated to define a finite sample breakdown point for SIR as a measure of global robustness. Our theoretical findings are illustrated through simulation.

Suggested Citation

  • Ulrike Genschel, 2018. "The Effect of Data Contamination in Sliced Inverse Regression and Finite Sample Breakdown Point," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 80(1), pages 28-58, February.
  • Handle: RePEc:spr:sankha:v:80:y:2018:i:1:d:10.1007_s13171-017-0102-x
    DOI: 10.1007/s13171-017-0102-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s13171-017-0102-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s13171-017-0102-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lexin Li, 2007. "Sparse sufficient dimension reduction," Biometrika, Biometrika Trust, vol. 94(3), pages 603-613.
    2. Yanyuan Ma & Liping Zhu, 2012. "A Semiparametric Approach to Dimension Reduction," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 168-179, March.
    3. Yingcun Xia & Howell Tong & W. K. Li & Li‐Xing Zhu, 2002. "An adaptive estimation of dimension reduction space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(3), pages 363-410, August.
    4. Li, Lexin & Li, Bing & Zhu, Li-Xing, 2010. "Groupwise Dimension Reduction," Journal of the American Statistical Association, American Statistical Association, vol. 105(491), pages 1188-1201.
    5. R. Dennis Cook & Liqiang Ni, 2006. "Using intraslice covariances for improved estimation of the central subspace in regression," Biometrika, Biometrika Trust, vol. 93(1), pages 65-74, March.
    6. Luke A. Prendergast, 2007. "Implications of influence function analysis for sliced inverse regression and sliced average variance estimation," Biometrika, Biometrika Trust, vol. 94(3), pages 585-601.
    7. Dong, Yuexiao & Yu, Zhou & Zhu, Liping, 2015. "Robust inverse regression for dimension reduction," Journal of Multivariate Analysis, Elsevier, vol. 134(C), pages 71-81.
    8. L. A. Prendergast, 2005. "Influence Functions for Sliced Inverse Regression," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 32(3), pages 385-404, September.
    9. Croux, Christophe & Ruiz-Gazen, Anne, 2005. "High breakdown estimators for principal components: the projection-pursuit approach revisited," Journal of Multivariate Analysis, Elsevier, vol. 95(1), pages 206-226, July.
    10. Cook, R. Dennis & Ni, Liqiang, 2005. "Sufficient Dimension Reduction via Inverse Regression: A Minimum Discrepancy Approach," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 410-428, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zifang Guo & Lexin Li & Wenbin Lu & Bing Li, 2015. "Groupwise Dimension Reduction via Envelope Method," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1515-1527, December.
    2. Chen, Canyi & Xu, Wangli & Zhu, Liping, 2022. "Distributed estimation in heterogeneous reduced rank regression: With application to order determination in sufficient dimension reduction," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    3. Kangning Wang & Lu Lin, 2017. "Robust and efficient direction identification for groupwise additive multiple-index models and its applications," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 26(1), pages 22-45, March.
    4. Weng, Jiaying, 2022. "Fourier transform sparse inverse regression estimators for sufficient variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    5. Lu Li & Kai Tan & Xuerong Meggie Wen & Zhou Yu, 2023. "Variable-dependent partial dimension reduction," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(2), pages 521-541, June.
    6. Prendergast, Luke A. & Smith, Jodie A., 2022. "Influence functions for linear discriminant analysis: Sensitivity analysis and efficient influence diagnostics," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    7. Deng, Jianqiu & Yang, Xiaojie & Wang, Qihua, 2022. "Surrogate space based dimension reduction for nonignorable nonresponse," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    8. Zhang, Hong-Fan, 2021. "Minimum Average Variance Estimation with group Lasso for the multivariate response Central Mean Subspace," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    9. Heng-Hui Lue & Bing-Ran You, 2013. "High-dimensional regression analysis with treatment comparisons," Computational Statistics, Springer, vol. 28(3), pages 1299-1317, June.
    10. Yu, Zhou & Zhu, Lixing & Wen, Xuerong Meggie, 2012. "On model-free conditional coordinate tests for regressions," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 61-72.
    11. Howard D. Bondell & Lexin Li, 2009. "Shrinkage inverse regression estimation for model‐free variable selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(1), pages 287-299, January.
    12. Wu, Runxiong & Chen, Xin, 2021. "MM algorithms for distance covariance based sufficient dimension reduction and sufficient variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 155(C).
    13. Coudret, R. & Girard, S. & Saracco, J., 2014. "A new sliced inverse regression method for multivariate response," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 285-299.
    14. Qin Wang & Yuan Xue, 2023. "A structured covariance ensemble for sufficient dimension reduction," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(3), pages 777-800, September.
    15. Zhou, Jingke & Zhu, Lixing, 2016. "Principal minimax support vector machine for sufficient dimension reduction with contaminated data," Computational Statistics & Data Analysis, Elsevier, vol. 94(C), pages 33-48.
    16. Changrong Yan & Dixin Zhang, 2013. "Sparse dimension reduction for survival data," Computational Statistics, Springer, vol. 28(4), pages 1835-1852, August.
    17. Yin, Xiangrong & Li, Bing & Cook, R. Dennis, 2008. "Successive direction extraction for estimating the central subspace in a multiple-index regression," Journal of Multivariate Analysis, Elsevier, vol. 99(8), pages 1733-1757, September.
    18. Feng, Zhenghui & Wang, Tao & Zhu, Lixing, 2014. "Transformation-based estimation," Computational Statistics & Data Analysis, Elsevier, vol. 78(C), pages 186-205.
    19. Xinchao Luo & Lixing Zhu & Hongtu Zhu, 2016. "Single‐index varying coefficient model for functional responses," Biometrics, The International Biometric Society, vol. 72(4), pages 1275-1284, December.
    20. Bilin Zeng & Xuerong Meggie Wen & Lixing Zhu, 2017. "A link-free sparse group variable selection method for single-index model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2388-2400, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:sankha:v:80:y:2018:i:1:d:10.1007_s13171-017-0102-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.