IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v63y2022i5d10.1007_s00362-021-01279-4.html
   My bibliography  Save this article

Local influence diagnostics with forward search in regression analysis

Author

Listed:
  • Reiko Aoki

    (Universidade de São Paulo)

  • Juan P. M. Bustamante

    (Universidade de São Paulo)

  • Gilberto A. Paula

    (Universidade de São Paulo)

Abstract

Regression analysis is one of the most widely used statistical techniques. It is well known that the least squares estimates is sensitive to atypical and/or influential observations. Many methodologies were proposed to detect influential observations considering case deletion (global influence). On the other hand, Cook (J R Stat Soc Ser B 48(2):133–169, 1986) developed a general and powerful methodology to obtain a group of observations that might be jointly influential considering the local influence. However, these techniques may fail to detect masked influential observations. In this paper, we propose a methodology to detect masked influential observations in a local influence framework considering the forward search (Atkinson and Riani, Robust diagnostic regression analysis, Springer, New York, 2000). The usefulness of the proposed methodology is illustrated with data sets which were previously analyzed in the literature to detect outliers and/or influential observations. Masked influential observations were successfully identified in these studies. The proposed methodology may be used in any model where the local influence analysis (Cook 1986) is appropriate.

Suggested Citation

  • Reiko Aoki & Juan P. M. Bustamante & Gilberto A. Paula, 2022. "Local influence diagnostics with forward search in regression analysis," Statistical Papers, Springer, vol. 63(5), pages 1477-1497, October.
  • Handle: RePEc:spr:stpapr:v:63:y:2022:i:5:d:10.1007_s00362-021-01279-4
    DOI: 10.1007/s00362-021-01279-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00362-021-01279-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00362-021-01279-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Paula, Gilberto A., 1993. "Assessing local influence in restricted regression models," Computational Statistics & Data Analysis, Elsevier, vol. 16(1), pages 63-79, June.
    2. Fukang Zhu & Shuangzhe Liu & Lei Shi, 2016. "Local influence analysis for Poisson autoregression with an application to stock transaction data," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 70(1), pages 4-25, February.
    3. Anthony C. Atkinson & Marco Riani & Andrea Cerioli, 2018. "Cluster detection and clustering with random start forward searches," Journal of Applied Statistics, Taylor & Francis Journals, vol. 45(5), pages 777-798, April.
    4. Andrea Cerioli & Alessio Farcomeni & Marco Riani, 2019. "Wild adaptive trimming for robust estimation and cluster analysis," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 46(1), pages 235-256, March.
    5. Russo, Cibele M. & Paula, Gilberto A. & Aoki, Reiko, 2009. "Influence diagnostics in nonlinear mixed-effects elliptical models," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4143-4156, October.
    6. Aurea Grané & Giancarlo Manzi & Silvia Salini, 2021. "Smart Visualization of Mixed Data," Stats, MDPI, vol. 4(2), pages 1-14, June.
    7. Pedro Galeano & Daniel Peña, 2019. "Rejoinder on: Data science, big data and statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(2), pages 363-368, June.
    8. Cibele M. Russo & Gilberto A. Paula & Francisco Jos� A. Cysneiros & Reiko Aoki, 2012. "Influence diagnostics in heteroscedastic and/or autoregressive nonlinear elliptical models for correlated data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(5), pages 1049-1067, October.
    9. Andrea Cerioli & Marco Riani & Anthony C. Atkinson & Aldo Corbellini, 2018. "The power of monitoring: how to make the most of a contaminated multivariate sample," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(4), pages 559-587, December.
    10. Pedro Galeano & Daniel Peña, 2019. "Data science, big data and statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(2), pages 289-329, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alessio Farcomeni & Antonio Punzo, 2020. "Robust model-based clustering with mild and gross outliers," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(4), pages 989-1007, December.
    2. Brenton R. Clarke & Andrew Grose, 2023. "A further study comparing forward search multivariate outlier methods including ATLA with an application to clustering," Statistical Papers, Springer, vol. 64(2), pages 395-420, April.
    3. Marco Riani & Anthony C. Atkinson & Andrea Cerioli & Aldo Corbellini, 2019. "Comments on: Data science, big data and statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(2), pages 349-352, June.
    4. Hassani, Hossein & Beneki, Christina & Silva, Emmanuel Sirimal & Vandeput, Nicolas & Madsen, Dag Øivind, 2021. "The science of statistics versus data science: What is the future?," Technological Forecasting and Social Change, Elsevier, vol. 173(C).
    5. Šárka Brodinová & Peter Filzmoser & Thomas Ortner & Christian Breiteneder & Maia Rohm, 2019. "Robust and sparse k-means clustering for high-dimensional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 905-932, December.
    6. Cappozzo, Andrea & Greselin, Francesca & Murphy, Thomas Brendan, 2021. "Robust variable selection for model-based learning in presence of adulteration," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    7. Huiyu Mao & Fukang Zhu & Yan Cui, 2020. "A generalized mixture integer-valued GARCH model," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 29(3), pages 527-552, September.
    8. Cibele M. Russo & Gilberto A. Paula & Francisco Jos� A. Cysneiros & Reiko Aoki, 2012. "Influence diagnostics in heteroscedastic and/or autoregressive nonlinear elliptical models for correlated data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(5), pages 1049-1067, October.
    9. Umberto Amato & Anestis Antoniadis & Italia De Feis & Irene Gijbels, 2021. "Penalised robust estimators for sparse and high-dimensional linear models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 1-48, March.
    10. Anthony C. Atkinson & Aldo Corbellini & Marco Riani, 2017. "Robust Bayesian regression with the forward search: theory and data analysis," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 26(4), pages 869-886, December.
    11. Torti, Francesca & Corbellini, Aldo & Atkinson, Anthony C., 2021. "fsdaSAS: a package for robust regression for very large datasets including the batch forward search," LSE Research Online Documents on Economics 109895, London School of Economics and Political Science, LSE Library.
    12. Pokojovy, Michael & Jobe, J. Marcus, 2022. "A robust deterministic affine-equivariant algorithm for multivariate location and scatter," Computational Statistics & Data Analysis, Elsevier, vol. 172(C).
    13. Ricardo A. Maronna & Víctor J. Yohai, 2018. "Discussion of “The power of monitoring: how to make the most of a contaminated multivariate sample” by Andrea Cerioli, Marco Riani, Anthony C. Atkinson and Aldo Corbellini," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(4), pages 603-604, December.
    14. Robert G. Aykroyd & Víctor Leiva & Carolina Marchant, 2018. "Multivariate Birnbaum-Saunders Distributions: Modelling and Applications," Risks, MDPI, vol. 6(1), pages 1-25, March.
    15. Liu, Shuangzhe & Leiva, Víctor & Zhuang, Dan & Ma, Tiefeng & Figueroa-Zúñiga, Jorge I., 2022. "Matrix differential calculus with applications in the multivariate linear model and its diagnostics," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    16. Fernanda De Bastiani & Audrey Mariz de Aquino Cysneiros & Miguel Uribe-Opazo & Manuel Galea, 2015. "Influence diagnostics in elliptical spatial linear models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(2), pages 322-340, June.
    17. Kang-Ping Lu & Shao-Tung Chang, 2021. "Robust Algorithms for Change-Point Regressions Using the t -Distribution," Mathematics, MDPI, vol. 9(19), pages 1-28, September.
    18. L. A. García-Escudero & A. Gordaliza & C. Matrán & A. Mayo-Iscar, 2018. "Comments on “The power of monitoring: how to make the most of a contaminated multivariate sample”," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(4), pages 605-608, December.
    19. Paolo Gorgi, 2020. "Beta–negative binomial auto‐regressions for modelling integer‐valued time series with extreme observations," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(5), pages 1325-1347, December.
    20. Marco Riani & Anthony C. Atkinson & Francesca Torti & Aldo Corbellini, 2022. "Robust correspondence analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1381-1401, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:63:y:2022:i:5:d:10.1007_s00362-021-01279-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.