IDEAS home Printed from https://ideas.repec.org/p/arx/papers/1507.02493.html
   My bibliography  Save this paper

Inference in Linear Regression Models with Many Covariates and Heteroskedasticity

Author

Listed:
  • Matias D. Cattaneo
  • Michael Jansson
  • Whitney K. Newey

Abstract

The linear regression model is widely used in empirical work in Economics, Statistics, and many other disciplines. Researchers often include many covariates in their linear model specification in an attempt to control for confounders. We give inference methods that allow for many covariates and heteroskedasticity. Our results are obtained using high-dimensional approximations, where the number of included covariates are allowed to grow as fast as the sample size. We find that all of the usual versions of Eicker-White heteroskedasticity consistent standard error estimators for linear models are inconsistent under this asymptotics. We then propose a new heteroskedasticity consistent standard error formula that is fully automatic and robust to both (conditional)\ heteroskedasticity of unknown form and the inclusion of possibly many covariates. We apply our findings to three settings: parametric linear models with many covariates, linear panel models with many fixed effects, and semiparametric semi-linear models with many technical regressors. Simulation evidence consistent with our theoretical results is also provided. The proposed methods are also illustrated with an empirical application.

Suggested Citation

  • Matias D. Cattaneo & Michael Jansson & Whitney K. Newey, 2015. "Inference in Linear Regression Models with Many Covariates and Heteroskedasticity," Papers 1507.02493, arXiv.org, revised Jan 2017.
  • Handle: RePEc:arx:papers:1507.02493
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/1507.02493
    File Function: Latest version
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. James H. Stock & Mark W. Watson, 2008. "Heteroskedasticity-Robust Standard Errors for Fixed Effects Panel Data Regression," Econometrica, Econometric Society, vol. 76(1), pages 155-174, January.
    2. J.J. Heckman & E.E. Leamer (ed.), 2007. "Handbook of Econometrics," Handbook of Econometrics, Elsevier, edition 1, volume 6, number 6b.
    3. Pedro Carneiro & James J. Heckman & Edward J. Vytlacil, 2011. "Estimating Marginal Returns to Education," American Economic Review, American Economic Association, vol. 101(6), pages 2754-2781, October.
    4. James G. MacKinnon, 2012. "Thirty Years Of Heteroskedasticity-robust Inference," Working Paper 1268, Economics Department, Queen's University.
    5. Farrell, Max H., 2015. "Robust inference on average treatment effects with possibly more covariates than observations," Journal of Econometrics, Elsevier, vol. 189(1), pages 1-23.
    6. Kline, Patrick & Santos, Andres, 2012. "Higher order properties of the wild bootstrap under misspecification," Journal of Econometrics, Elsevier, vol. 171(1), pages 54-70.
    7. Koenker, Roger, 1988. "Asymptotic Theory and Econometric Practice," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 3(2), pages 139-147, April.
    8. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "Inference on Treatment Effects after Selection among High-Dimensional Controlsâ€," Review of Economic Studies, Oxford University Press, vol. 81(2), pages 608-650.
    9. Newey, Whitney K., 1997. "Convergence rates and asymptotic normality for series estimators," Journal of Econometrics, Elsevier, vol. 79(1), pages 147-168, July.
    10. Alberto Abadie & Guido W. Imbens & Fanyin Zheng, 2014. "Inference for Misspecified Models With Fixed Regressors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(508), pages 1601-1614, December.
    11. Shurong Zheng & Dandan Jiang & Zhidong Bai & Xuming He, 2014. "Inference on multiple correlation coefficients with moderately high dimensional data," Biometrika, Biometrika Trust, vol. 101(3), pages 748-754.
    12. Chesher, Andrew, 1989. "Hajek Inequalities, Measures of Leverage and the Size of Heteroskedasticity Robust Wald Tests," Econometrica, Econometric Society, vol. 57(4), pages 971-977, July.
    13. J.J. Heckman & E.E. Leamer (ed.), 2007. "Handbook of Econometrics," Handbook of Econometrics, Elsevier, edition 1, volume 6, number 6a.
    14. Belloni, Alexandre & Chernozhukov, Victor & Chetverikov, Denis & Kato, Kengo, 2015. "Some new asymptotic theory for least squares series: Pointwise and uniform results," Journal of Econometrics, Elsevier, vol. 186(2), pages 345-366.
    15. White, Halbert, 1980. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity," Econometrica, Econometric Society, vol. 48(4), pages 817-838, May.
    16. Chesher, Andrew & Jewitt, Ian, 1987. "The Bias of a Heteroskedasticity Consistent Covariance Matrix Estimator," Econometrica, Econometric Society, vol. 55(5), pages 1217-1222, September.
    17. Joshua Angrist & Jinyong Hahn, 2004. "When to Control for Covariates? Panel Asymptotics for Estimates of Treatment Effects," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 58-72, February.
    18. Chen, Xiaohong, 2007. "Large Sample Sieve Estimation of Semi-Nonparametric Models," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 76, Elsevier.
    19. A. Belloni & V. Chernozhukov & I. Fernández‐Val & C. Hansen, 2017. "Program Evaluation and Causal Inference With High‐Dimensional Data," Econometrica, Econometric Society, vol. 85, pages 233-298, January.
    20. MacKinnon, James G. & White, Halbert, 1985. "Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties," Journal of Econometrics, Elsevier, vol. 29(3), pages 305-325, September.
    21. Ulrich K. Müller, 2013. "Risk of Bayesian Inference in Misspecified Models, and the Sandwich Covariance Matrix," Econometrica, Econometric Society, vol. 81(5), pages 1805-1849, September.
    22. Goncalves, Silvia & White, Halbert, 2005. "Bootstrap Standard Error Estimates for Linear Regression," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 970-979, September.
    23. Donald, S. G. & Newey, W. K., 1994. "Series Estimation of Semilinear Models," Journal of Multivariate Analysis, Elsevier, vol. 50(1), pages 30-40, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    2. Matias D Cattaneo & Michael Jansson & Xinwei Ma, 2019. "Two-Step Estimation and Inference with Possibly Many Included Covariates," Review of Economic Studies, Oxford University Press, vol. 86(3), pages 1095-1122.
    3. Su, Liangjun & Ura, Takuya & Zhang, Yichong, 2019. "Non-separable models with high-dimensional data," Journal of Econometrics, Elsevier, vol. 212(2), pages 646-677.
    4. Di Addario, Sabrina & Kline, Patrick & Saggio, Raffaele & Solvsten, Mikkel, 2020. "It Ain't Where You're From, It's Where You're At: Hiring Origins, Firm Heterogeneity, and Wages," Institute for Research on Labor and Employment, Working Paper Series qt6191m92m, Institute of Industrial Relations, UC Berkeley.
    5. Patrick Kline & Raffaele Saggio & Mikkel Sølvsten, 2020. "Leave‐Out Estimation of Variance Components," Econometrica, Econometric Society, vol. 88(5), pages 1859-1898, September.
    6. Stanislav Anatolyev & Mikkel S{o}lvsten, 2020. "Testing Many Restrictions Under Heteroskedasticity," Papers 2003.07320, arXiv.org, revised Jan 2023.
    7. Kuanhao Jiang & Rajarshi Mukherjee & Subhabrata Sen & Pragya Sur, 2022. "A New Central Limit Theorem for the Augmented IPW Estimator: Variance Inflation, Cross-Fit Covariance and Beyond," Papers 2205.10198, arXiv.org, revised Oct 2022.
    8. Zhuan Pei & Jörn-Steffen Pischke & Hannes Schwandt, 2019. "Poorly Measured Confounders are More Useful on the Left than on the Right," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 37(2), pages 205-216, April.
    9. Jochmans, K., 2019. "Heteroskedasticity-Robust Inference in Linear Regression Models," Cambridge Working Papers in Economics 1957, Faculty of Economics, University of Cambridge.
    10. Chaohua Dong & Jiti Gao & Oliver Linton, 2017. "High dimensional semiparametric moment restriction models," Monash Econometrics and Business Statistics Working Papers 17/17, Monash University, Department of Econometrics and Business Statistics.
    11. Kaspar Wuthrich & Ying Zhu, 2019. "Omitted variable bias of Lasso-based inference methods: A finite sample analysis," Papers 1903.08704, arXiv.org, revised Sep 2021.
    12. Smith, Simon C. & Timmermann, Allan & Zhu, Yinchu, 2019. "Variable selection in panel models with breaks," Journal of Econometrics, Elsevier, vol. 212(1), pages 323-344.
    13. Yanqin Fan & Fang Han & Wei Li & Xiao-Hua Zhou, 2019. "On rank estimators in increasing dimensions," Papers 1908.05255, arXiv.org.
    14. Jose Luis Montiel Olea & Pietro Ortoleva & Mallesh M Pai & Andrea Prat, 2019. "Competing Models," Papers 1907.03809, arXiv.org, revised Nov 2021.
    15. Mittag, Nikolas, 2019. "A simple method to estimate large fixed effects models applied to wage determinants," Labour Economics, Elsevier, vol. 61(C).
    16. Ng Cheuk Fai, 2022. "Robust Inference in High Dimensional Linear Model with Cluster Dependence," Papers 2212.05554, arXiv.org.
    17. Anatolyev, Stanislav, 2021. "Mallows criterion for heteroskedastic linear regressions with many regressors," Economics Letters, Elsevier, vol. 203(C).
    18. Riccardo D'Adamo, 2018. "Cluster-Robust Standard Errors for Linear Regression Models with Many Controls," Papers 1806.07314, arXiv.org, revised Apr 2019.
    19. Valentin Verdier, 2020. "Estimation and Inference for Linear Models with Two-Way Fixed Effects and Sparsely Matched Data," The Review of Economics and Statistics, MIT Press, vol. 102(1), pages 1-16, March.
    20. Fan, Yanqin & Han, Fang & Li, Wei & Zhou, Xiao-Hua, 2020. "On rank estimators in increasing dimensions," Journal of Econometrics, Elsevier, vol. 214(2), pages 379-412.
    21. Sabrina Di Addario & Patrick Kline & Raffaele Saggio & Mikkel Soelvsten, 2022. "It ain't where you're from it's where you're at: firm effects, state dependence, and the gender wage gap," Temi di discussione (Economic working papers) 1374, Bank of Italy, Economic Research and International Relations Area.
    22. James G. MacKinnon, 2022. "Using Large Samples in Econometrics," Working Paper 1482, Economics Department, Queen's University.
    23. Richard, Patrick, 2019. "Residual bootstrap tests in linear models with many regressors," Journal of Econometrics, Elsevier, vol. 208(2), pages 367-394.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cattaneo, Matias D. & Jansson, Michael & Newey, Whitney K., 2018. "Alternative Asymptotics And The Partially Linear Model With Many Regressors," Econometric Theory, Cambridge University Press, vol. 34(2), pages 277-301, April.
    2. Matias D Cattaneo & Michael Jansson & Xinwei Ma, 2019. "Two-Step Estimation and Inference with Possibly Many Included Covariates," Review of Economic Studies, Oxford University Press, vol. 86(3), pages 1095-1122.
    3. Yang Ning & Sida Peng & Jing Tao, 2020. "Doubly Robust Semiparametric Difference-in-Differences Estimators with High-Dimensional Data," Papers 2009.03151, arXiv.org.
    4. Jochmans, K., 2019. "Heteroskedasticity-Robust Inference in Linear Regression Models," Cambridge Working Papers in Economics 1957, Faculty of Economics, University of Cambridge.
    5. Qiu, Chen & Otsu, Taisuke, 2022. "Information theoretic approach to high dimensional multiplicative models: stochastic discount factor and treatment effect," LSE Research Online Documents on Economics 110494, London School of Economics and Political Science, LSE Library.
    6. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    7. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    8. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP54/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    9. Richard H. Spady & Sami Stouli, 2018. "Simultaneous Mean-Variance Regression," Bristol Economics Discussion Papers 18/697, School of Economics, University of Bristol, UK.
    10. Robert A. Moffitt & Matthew V. Zahn, 2019. "The Marginal Labor Supply Disincentives of Welfare: Evidence from Administrative Barriers to Participation," NBER Working Papers 26028, National Bureau of Economic Research, Inc.
    11. Benedikt M. Potscher & David Preinerstorfer, 2020. "How Reliable are Bootstrap-based Heteroskedasticity Robust Tests?," Papers 2005.04089, arXiv.org, revised Nov 2021.
    12. Kyle Colangelo & Ying-Ying Lee, 2020. "Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments," Papers 2004.03036, arXiv.org, revised Jul 2022.
    13. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2011. "Inference on Treatment Effects After Selection Amongst High-Dimensional Controls," Papers 1201.0224, arXiv.org, revised May 2012.
    14. Romano, Joseph P. & Wolf, Michael, 2017. "Resurrecting weighted least squares," Journal of Econometrics, Elsevier, vol. 197(1), pages 1-19.
    15. Difang Huang & Jiti Gao & Tatsushi Oka, 2022. "Semiparametric Single-Index Estimation for Average Treatment Effects," Monash Econometrics and Business Statistics Working Papers 10/22, Monash University, Department of Econometrics and Business Statistics.
    16. Michael O'Hara & Christopher F. Parmeter, 2013. "Nonparametric Generalized Least Squares in Applied Regression Analysis," Pacific Economic Review, Wiley Blackwell, vol. 18(4), pages 456-474, October.
    17. Byunghoon Kang, 2018. "Inference in Nonparametric Series Estimation with Specification Searches for the Number of Series Terms," Working Papers 240829404, Lancaster University Management School, Economics Department.
    18. Byunghoon Kang, 2019. "Inference in Nonparametric Series Estimation with Specification Searches for the Number of Series Terms," Papers 1909.12162, arXiv.org, revised Feb 2020.
    19. Eric S. Lin & Ta-Sheng Chou, 2018. "Finite-sample refinement of GMM approach to nonlinear models under heteroskedasticity of unknown form," Econometric Reviews, Taylor & Francis Journals, vol. 37(1), pages 1-28, January.
    20. Pötscher, Benedikt M. & Preinerstorfer, David, 2021. "Valid Heteroskedasticity Robust Testing," MPRA Paper 107420, University Library of Munich, Germany.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:1507.02493. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: . General contact details of provider: http://arxiv.org/ .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.