IDEAS home Printed from https://ideas.repec.org/p/qed/wpaper/1421.html
   My bibliography  Save this paper

When and How to Deal with Clustered Errors in Regression Models

Author

Listed:
  • James G. MacKinnon

    () (Queen's University)

  • Matthew D. Webb

    () (Carleton University)

Abstract

We discuss when and how to deal with possibly clustered errors in linear regression models. Specifically, we discuss situations in which a regression model may plausibly be treated as having error terms that are arbitrarily correlated within known clusters but uncorrelated across them. The methods we discuss include various covariance matrix estimators, possibly combined with various methods of obtaining critical values, several bootstrap procedures, and randomization inference. Special attention is given to models with few treated clusters and clusters that vary a lot in size, where inference may be problematic. Two empirical examples illustrate the methods we discuss and the concerns we raise, and a simulation experiment illustrates the consequences of over-clustering and under-clustering.

Suggested Citation

  • James G. MacKinnon & Matthew D. Webb, 2020. "When and How to Deal with Clustered Errors in Regression Models," Working Paper 1421, Economics Department, Queen's University.
  • Handle: RePEc:qed:wpaper:1421
    as

    Download full text from publisher

    File URL: https://www.econ.queensu.ca/sites/econ.queensu.ca/files/wpaper/qed_wp_1421.pdf
    File Function: Second version 2020
    Download Restriction: no

    References listed on IDEAS

    as
    1. Bruno Ferman, 2019. "Inference in Differences-in-Differences: How Much Should We Trust in Independent Clusters?," Papers 1909.01782, arXiv.org, revised Sep 2020.
    2. Guido W. Imbens & Michal Kolesár, 2016. "Robust Standard Errors in Small Samples: Some Practical Advice," The Review of Economics and Statistics, MIT Press, vol. 98(4), pages 701-712, October.
    3. repec:clg:wpaper:2013-20 is not listed on IDEAS
    4. James G. MacKinnon & Matthew D. Webb, 2018. "The wild bootstrap for few (treated) clusters," Econometrics Journal, Royal Economic Society, vol. 21(2), pages 114-135, June.
    5. Bruno Ferman & Cristine Pinto, 2019. "Inference in Differences-in-Differences with Few Treated Groups and Heteroskedasticity," The Review of Economics and Statistics, MIT Press, vol. 101(3), pages 452-467, July.
    6. A. Colin Cameron & Jonah B. Gelbach & Douglas L. Miller, 2008. "Bootstrap-Based Improvements for Inference with Clustered Errors," The Review of Economics and Statistics, MIT Press, vol. 90(3), pages 414-427, August.
    7. Davidson, Russell & Flachaire, Emmanuel, 2008. "The wild bootstrap, tamed at last," Journal of Econometrics, Elsevier, vol. 146(1), pages 162-169, September.
    8. Timothy G. Conley & Christopher R. Taber, 2011. "Inference with "Difference in Differences" with a Small Number of Policy Changes," The Review of Economics and Statistics, MIT Press, vol. 93(1), pages 113-125, February.
    9. Thompson, Samuel B., 2011. "Simple formulas for standard errors that cluster by both firm and time," Journal of Financial Economics, Elsevier, vol. 99(1), pages 1-10, January.
    10. James G. MacKinnon & Morten Ø. Nielsen & Matthew D. Webb, 2019. "Wild Bootstrap and Asymptotic Inference with Multiway Clustering," Working Paper 1415, Economics Department, Queen's University.
    11. A. Colin Cameron & Jonah B. Gelbach & Douglas L. Miller, 2011. "Robust Inference With Multiway Clustering," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 29(2), pages 238-249, April.
    12. Djogbenou, Antoine A. & MacKinnon, James G. & Nielsen, Morten Ørregaard, 2019. "Asymptotic theory and wild bootstrap inference with clustered errors," Journal of Econometrics, Elsevier, vol. 212(2), pages 393-412.
    13. Brewer Mike & Crossley Thomas F. & Joyce Robert, 2018. "Inference with Difference-in-Differences Revisited," Journal of Econometric Methods, De Gruyter, vol. 7(1), pages 1-16, January.
    14. Chang Hyung Lee & Douglas G. Steigerwald, 2018. "Inference for clustered data," Stata Journal, StataCorp LP, vol. 18(2), pages 447-460, June.
    15. Stanislav Kolenikov, 2010. "Resampling variance estimation for complex survey data," Stata Journal, StataCorp LP, vol. 10(2), pages 165-199, June.
    16. James G. MacKinnon, 2002. "Bootstrap inference in econometrics," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 35(4), pages 615-645, November.
    17. Andrew V. Carter & Kevin T. Schnepel & Douglas G. Steigerwald, 2017. "Asymptotic Behavior of a t -Test Robust to Cluster Heterogeneity," The Review of Economics and Statistics, MIT Press, vol. 99(4), pages 698-709, July.
    18. Antoine A. Djogbenou & James G. MacKinnon & Morten Orregard Nielsen, 2018. "Asymptotic Theory and Wild Bootstrap Inference with Clustered Errors," Working Papers 1399, Queen's University, Department of Economics.
    19. Ivan A. Canay & Joseph P. Romano & Azeem M. Shaikh, 2017. "Randomization Tests Under an Approximate Symmetry Assumption," Econometrica, Econometric Society, vol. 85, pages 1013-1030, May.
    20. Stephen G. Donald & Kevin Lang, 2007. "Inference with Difference-in-Differences and Other Panel Data," The Review of Economics and Statistics, MIT Press, vol. 89(2), pages 221-233, May.
    21. James G. MacKinnon, 2019. "How cluster-robust inference is changing applied econometrics," Canadian Journal of Economics, Canadian Economics Association, vol. 52(3), pages 851-881, August.
    22. Simon Heß, 2017. "Randomization inference with Stata: A guide and software," Stata Journal, StataCorp LP, vol. 17(3), pages 630-651, September.
    23. Russell Davidson & James MacKinnon, 2000. "Bootstrap tests: how many bootstraps?," Econometric Reviews, Taylor & Francis Journals, vol. 19(1), pages 55-68.
    24. David Roodman & James G. MacKinnon & Morten Ørregaard Nielsen & Matthew D. Webb, 2019. "Fast and wild: Bootstrap inference in Stata using boottest," Stata Journal, StataCorp LP, vol. 19(1), pages 4-60, March.
    25. Abadie, Alberto & Athey, Susan & Imbens, Guido W. & Wooldridge, Jeffrey M., 2017. "Sampling-Based vs. Design-Based Uncertainty in Regression Analysis," Research Papers 3349, Stanford University, Graduate School of Business.
    26. A. Colin Cameron & Douglas L. Miller, 2015. "A Practitioner’s Guide to Cluster-Robust Inference," Journal of Human Resources, University of Wisconsin Press, vol. 50(2), pages 317-372.
    27. Morgan Kelly, 2019. "The Standard Errors of Persistence," Working Papers 201913, School of Economics, University College Dublin.
    28. Marianne Bertrand & Esther Duflo & Sendhil Mullainathan, 2004. "How Much Should We Trust Differences-In-Differences Estimates?," The Quarterly Journal of Economics, Oxford University Press, vol. 119(1), pages 249-275.
    29. Moulton, Brent R., 1986. "Random group effects and the precision of regression estimates," Journal of Econometrics, Elsevier, vol. 32(3), pages 385-397, August.
    30. MacKinnon, James G., 2016. "Inference with Large Clustered Datasets," L'Actualité Economique, Société Canadienne de Science Economique, vol. 92(4), pages 649-665, Décembre.
    31. Ivan A. Canay & Andres Santos & Azeem M. Shaikh, 2018. "The wild bootstrap with a "small" number of "large" clusters," CeMMAP working papers CWP27/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    32. Kloek, T, 1981. "OLS Estimation in a Model Where a Microvariable Is Explained by Aggregates and Contemporaneous Disturbances Are Equicorrelated," Econometrica, Econometric Society, vol. 49(1), pages 205-207, January.
    33. Xavier Giné & Ghazala Mansuri, 2018. "Together We Will: Experimental Evidence on Female Voting Behavior in Pakistan," American Economic Journal: Applied Economics, American Economic Association, vol. 10(1), pages 207-235, January.
    34. Andreas Hagemann, 2019. "Permutation inference with a finite number of heterogeneous clusters," Papers 1907.01049, arXiv.org.
    35. Hagemann, Andreas, 2019. "Placebo inference on treatment effects when the number of clusters is small," Journal of Econometrics, Elsevier, vol. 213(1), pages 190-209.
    36. MacKinnon , James G., 2015. "Wild Cluster Bootstrap Confidence Intervals," L'Actualité Economique, Société Canadienne de Science Economique, vol. 91(1-2), pages 11-33, Mars-Juin.
    37. White, Halbert, 1980. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity," Econometrica, Econometric Society, vol. 48(4), pages 817-838, May.
    38. Hansen, Bruce E. & Lee, Seojeong, 2019. "Asymptotic theory for clustered samples," Journal of Econometrics, Elsevier, vol. 210(2), pages 268-290.
    39. Esarey, Justin & Menger, Andrew, 2019. "Practical and Effective Approaches to Dealing With Clustered Data," Political Science Research and Methods, Cambridge University Press, vol. 7(3), pages 541-559, July.
    40. Matthew D. Webb, 2014. "Reworking Wild Bootstrap Based Inference For Clustered Errors," Working Paper 1315, Economics Department, Queen's University.
    41. James G. MacKinnon & Matthew D. Webb, 2019. "Wild Bootstrap Randomization Inference for Few Treated Clusters," Advances in Econometrics, in: Kim P. Huynh & David T. Jacho-chávez & Gautam Tripathi (ed.),The Econometrics of Complex Survey Data, volume 39, pages 61-85, Emerald Publishing Ltd.
    42. Arellano, M, 1987. "Computing Robust Standard Errors for Within-Groups Estimators," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 49(4), pages 431-434, November.
    43. Thomas Barrios & Rebecca Diamond & Guido W. Imbens & Michal Kolesár, 2012. "Clustering, Spatial Correlations, and Randomization Inference," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(498), pages 578-591, June.
    44. Laurent Davezies & Xavier D'Haultfoeuille & Yannick Guyonvarch, 2018. "Asymptotic results under multiway clustering," Papers 1807.07925, arXiv.org, revised Aug 2018.
    45. Nickell, Stephen J, 1981. "Biases in Dynamic Models with Fixed Effects," Econometrica, Econometric Society, vol. 49(6), pages 1417-1426, November.
    46. MacKinnon, James G. & White, Halbert, 1985. "Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties," Journal of Econometrics, Elsevier, vol. 29(3), pages 305-325, September.
    47. Bester, C. Alan & Conley, Timothy G. & Hansen, Christian B., 2011. "Inference with dependent data using cluster covariance estimators," Journal of Econometrics, Elsevier, vol. 165(2), pages 137-151.
    48. Kelly, Morgan, 2019. "The Standard Errors of Persistence," CEPR Discussion Papers 13783, C.E.P.R. Discussion Papers.
    49. Conley, T. G., 1999. "GMM estimation with cross sectional dependence," Journal of Econometrics, Elsevier, vol. 92(1), pages 1-45, September.
    50. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, December.
    51. Moulton, Brent R, 1990. "An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Unit," The Review of Economics and Statistics, MIT Press, vol. 72(2), pages 334-338, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Bruno Ferman, 2019. "A simple way to assess inference methods," Papers 1912.08772, arXiv.org, revised Dec 2020.

    More about this item

    Keywords

    clustered data; cluster-robust variance estimator; CRVE; wild cluster bootstrap; robust inference;

    JEL classification:

    • C15 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Statistical Simulation Methods: General
    • C21 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Cross-Sectional Models; Spatial Models; Treatment Effect Models
    • C23 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Models with Panel Data; Spatio-temporal Models

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:qed:wpaper:1421. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Mark Babcock). General contact details of provider: http://edirc.repec.org/data/qedquca.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.