IDEAS home Printed from https://ideas.repec.org/a/tsj/stataj/v10y2010i2p259-266.html

Multivariate outlier detection in Stata

Author

Listed:
  • Vincenzo Verardi

    (University of Namur)

  • Catherine Dehon

    (Universite libre de Bruxelles)

Abstract

Before implementing any multivariate statistical analysis based on em- pirical covariance matrices, it is important to check whether outliers are present because their existence could induce significant biases. In this article, we present the minimum covariance determinant estimator, which is commonly used in ro- bust statistics to estimate location parameters and multivariate scales. These estimators can be used to robustify Mahalanobis distances and to identify outliers. Verardi and Croux (1999, Stata Journal 9: 439–453; 2010, Stata Journal 10: 313) programmed this estimator in Stata and made it available with the mcd command. The implemented algorithm is relatively fast and, as we show in the simulation example section, outperforms the methods already available in Stata, such as the Hadi method. Copyright 2010 by StataCorp LP.

Suggested Citation

  • Vincenzo Verardi & Catherine Dehon, 2010. "Multivariate outlier detection in Stata," Stata Journal, StataCorp LLC, vol. 10(2), pages 259-266, June.
  • Handle: RePEc:tsj:stataj:v:10:y:2010:i:2:p:259-266
    Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj10-2/st0192/
    as

    Download full text from publisher

    File URL: http://www.stata-journal.com/article.html?article=st0192
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Hubert, Mia & Van Driessen, Katrien, 2004. "Fast and robust discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 45(2), pages 301-320, March.
    2. Vincenzo Verardi & Christophe Croux, 2009. "Robust regression in Stata," Stata Journal, StataCorp LLC, vol. 9(3), pages 439-453, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Jianzhou & Xiong, Shenghua, 2014. "A hybrid forecasting model based on outlier detection and fuzzy time series – A case study on Hainan wind farm of China," Energy, Elsevier, vol. 76(C), pages 526-541.
    2. Adam C. Sales & Ben B. Hansen, 2020. "Limitless Regression Discontinuity," Journal of Educational and Behavioral Statistics, , vol. 45(2), pages 143-174, April.
    3. Song, Zisheng, 2021. "The capitalization of school quality in rents in the Beijing housing market: A propensity score method," Working Paper Series 21/7, Royal Institute of Technology, Department of Real Estate and Construction Management & Banking and Finance.
    4. Christian Pfeifer & Joachim Wagner, 2014. "Age and gender effects of workforce composition on productivity and profits: Evidence from a new type of data for German enterprises," Contemporary Economics, Vizja University, vol. 8(1), March.
    5. Luigi Moretti & Paola Valbonesi, 2012. "Subcontracting in Public Procurement: An Empirical Investigation," "Marco Fanno" Working Papers 0154, Dipartimento di Scienze Economiche "Marco Fanno".
    6. Appio, Francesco Paolo & Leone, Daniele & Platania, Federico & Schiavone, Francesco, 2020. "Why are rewards not delivered on time in rewards-based crowdfunding campaigns? An empirical exploration," Technological Forecasting and Social Change, Elsevier, vol. 157(C).
    7. Kohei Kubota, 2017. "Intergenerational Wealth Elasticity in Japan," The Japanese Economic Review, Japanese Economic Association, vol. 68(4), pages 470-496, December.
    8. Dolabella, Marcelo & Mesquita Moreira, Mauricio, 2022. "Fighting Global Warming: Is Trade Policy in Latin America and the Caribbean a Help or a Hindrance?: Technical Appendix," IDB Publications (Working Papers) 12526, Inter-American Development Bank.
    9. Victor Ginsburgh & Olivier Gergaud & F. Livat, "undated". "Succes :Talent, Intelligence or Beauty?," ULB Institutional Repository 2013/151568, ULB -- Universite Libre de Bruxelles.
    10. Jan Ámos Víšek, 2015. "Estimating the Model with Fixed and Random Effects by a Robust Method," Methodology and Computing in Applied Probability, Springer, vol. 17(4), pages 999-1014, December.
    11. Neil Foster-McGregor & Anders Isaksson & Florian Kaulich, 2014. "Importing, exporting and performance in sub-Saharan African manufacturing firms," Review of World Economics (Weltwirtschaftliches Archiv), Springer;Institut für Weltwirtschaft (Kiel Institute for the World Economy), vol. 150(2), pages 309-336, May.
    12. Alper Sinan & B. Barıs Alkan, 2015. "A useful approach to identify the multicollinearity in the presence of outliers," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(5), pages 986-993, May.
    13. Joachim Wagner, 2014. "Exports, foreign direct investments and productivity: are services firms different?," The Service Industries Journal, Taylor & Francis Journals, vol. 34(1), pages 24-37, January.
    14. Horváth, Roman & Podpiera, Anca, 2012. "Heterogeneity in bank pricing policies: The Czech evidence," Economic Systems, Elsevier, vol. 36(1), pages 87-108.
    15. Joachim Wagner & John P. Weche Gelübcke, 2015. "Access to finance, foreign ownership and foreign takeovers in Germany," Applied Economics, Taylor & Francis Journals, vol. 47(29), pages 3092-3112, June.
    16. de Rassenfosse, Gaétan & Jaffe, Adam B., 2018. "Econometric evidence on the depreciation of innovations," European Economic Review, Elsevier, vol. 101(C), pages 625-642.
    17. Wagner Joachim, 2013. "The Great Export Recovery in German Manufacturing Industries, 2009/2010," Review of Economics, De Gruyter, vol. 64(3), pages 325-340, December.
    18. Carlo Ciccarelli & Alberto Dalmazzo & Tiziano Razzolini, 2024. "Sicilian sulphur and mafia: resources, working conditions and the practice of violence," Cliometrica, Springer;Cliometric Society (Association Francaise de Cliométrie), vol. 18(2), pages 531-565, May.
    19. Enrico D’Elia & Francesca Faedda & Giacomo Giannone, 2020. "Un modello statistico per il monitoraggio delle entrate tributarie (MoME)," Working Papers wp2020-5, Ministry of Economy and Finance, Department of Finance.
    20. Goetghebuer, Tatiana, 2011. "Productive inefficiency in patriarchal family farms: evidence from Mali," Proceedings of the German Development Economics Conference, Berlin 2011 34, Verein für Socialpolitik, Research Committee Development Economics.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:tsj:stataj:v:10:y:2010:i:2:p:259-266. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum or Lisa Gilmore (email available below). General contact details of provider: http://www.stata-journal.com/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.