IDEAS home Printed from https://ideas.repec.org/a/bpj/causin/v11y2023i1p12n1.html

Double machine learning and automated confounder selection: A cautionary tale

Author

Listed:
  • Hünermund Paul

    (Copenhagen Business School, Kilevej 14A, Frederiksberg, 2000, Denmark)

  • Louw Beyers

    (Maastricht University, Tongersestraat 53, 6211 LM Maastricht, Netherlands)

  • Caspi Itamar

    (Bank of Israel, P.O. Box 780, 91007, Jerusalem, Israel)

Abstract

Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This article demonstrates that DML is very sensitive to the inclusion of only a few “bad controls” in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.

Suggested Citation

  • Hünermund Paul & Louw Beyers & Caspi Itamar, 2023. "Double machine learning and automated confounder selection: A cautionary tale," Journal of Causal Inference, De Gruyter, vol. 11(1), pages 1-12, January.
  • Handle: RePEc:bpj:causin:v:11:y:2023:i:1:p:12:n:1
    DOI: 10.1515/jci-2022-0078
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/jci-2022-0078
    Download Restriction: no

    File URL: https://libkey.io/10.1515/jci-2022-0078?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Victor Chernozhukov & Carlos Cinelli & Whitney Newey & Amit Sharma & Vasilis Syrgkanis, 2021. "Long Story Short: Omitted Variable Bias in Causal Machine Learning," Papers 2112.13398, arXiv.org, revised May 2024.
    2. Joshua D. Angrist & Jörn-Steffen Pischke, 2009. "Mostly Harmless Econometrics: An Empiricist's Companion," Economics Books, Princeton University Press, edition 1, number 8769, December.
    3. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    4. Cartwright,Nancy, 2007. "Hunting Causes and Using Them," Cambridge Books, Cambridge University Press, number 9780521860819, August.
    5. Paul Hunermund & Elias Bareinboim, 2019. "Causal Inference and Data Fusion in Econometrics," Papers 1912.09104, arXiv.org, revised Mar 2023.
    6. A. Belloni & V. Chernozhukov & I. Fernández‐Val & C. Hansen, 2017. "Program Evaluation and Causal Inference With High‐Dimensional Data," Econometrica, Econometric Society, vol. 85, pages 233-298, January.
    7. Francine D. Blau & Lawrence M. Kahn, 2017. "The Gender Wage Gap: Extent, Trends, and Explanations," Journal of Economic Literature, American Economic Association, vol. 55(3), pages 789-865, September.
    8. Michael C. Knaus, 2021. "A double machine learning approach to estimate the effects of musical practice on student’s skills," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 282-300, January.
    9. Agrawal, Ajay & Gans, Joshua & Goldfarb, Avi (ed.), 2019. "The Economics of Artificial Intelligence," National Bureau of Economic Research Books, University of Chicago Press, number 9780226613338, August.
    10. Cartwright,Nancy, 2007. "Hunting Causes and Using Them," Cambridge Books, Cambridge University Press, number 9780521677981, August.
    11. Guido W. Imbens, 2004. "Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 4-29, February.
    12. Ajay Agrawal & Joshua Gans & Avi Goldfarb, 2019. "The Economics of Artificial Intelligence: An Agenda," NBER Books, National Bureau of Economic Research, Inc, number agra-1, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Uribe, Jorge M., 2025. "Investment in intangible assets and economic complexity," Research Policy, Elsevier, vol. 54(1).
    2. Elena Kotyrlo, 2025. "Evaluation of continuous treatment," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 80, pages 93-116.
    3. Patrick Rehill & Nicholas Biddle, 2023. "Fairness Implications of Heterogeneous Treatment Effect Estimation with Machine Learning Methods in Policy-making," Papers 2309.00805, arXiv.org.
    4. Santos, Anabela M. & Coad, Alex, 2023. "Monitoring and evaluation of transformative innovation policy: Suggestions for Improvement," Socio-Economic Planning Sciences, Elsevier, vol. 90(C).
    5. Raphael Langevin, 2026. "Policy Learning with Observational Data: The Case of Hepatitis C Treatment for HIV/HCV Co-Infected Patients," Papers 2605.16593, arXiv.org.
    6. Bilgin, Rumeysa, 2023. "The Selection Of Control Variables In Capital Structure Research With Machine Learning," SocArXiv e26qf, Center for Open Science.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jonathan Fuhr & Philipp Berens & Dominik Papies, 2024. "Estimating Causal Effects with Double Machine Learning -- A Method Evaluation," Papers 2403.14385, arXiv.org, revised Apr 2024.
    2. Paul Hunermund & Elias Bareinboim, 2019. "Causal Inference and Data Fusion in Econometrics," Papers 1912.09104, arXiv.org, revised Mar 2023.
    3. Michael Lechner & Jana Mareckova, 2024. "Comprehensive Causal Machine Learning," Papers 2405.10198, arXiv.org, revised Feb 2025.
    4. Vira Semenova, 2020. "Generalized Lee Bounds," Papers 2008.12720, arXiv.org, revised May 2025.
    5. Martin Huber, 2019. "An introduction to flexible methods for policy evaluation," Papers 1910.00641, arXiv.org.
    6. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    7. Yuehao Bai & Jizhou Liu & Azeem M. Shaikh & Max Tabord-Meehan, 2023. "On the Efficiency of Highly Stratified Experiments," Papers 2307.15181, arXiv.org, revised Mar 2026.
    8. Yihui He & Fang Han, 2023. "On propensity score matching with a diverging number of matches," Papers 2310.14142, arXiv.org, revised Nov 2023.
    9. Philipp Bach & Victor Chernozhukov & Malte S. Kurz & Martin Spindler & Sven Klaassen, 2021. "DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R," Papers 2103.09603, arXiv.org, revised Jun 2024.
    10. Stefano Comino & Alberto Galasso & Clara Graziano, 2020. "Market Power and Patent Strategies: Evidence from Renaissance Venice," Journal of Industrial Economics, Wiley Blackwell, vol. 68(2), pages 226-269, June.
    11. Michael C Knaus, 2022. "Double machine learning-based programme evaluation under unconfoundedness [Econometric methods for program evaluation]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 602-627.
    12. Emmanuel Flachaire & Bertille Picard, 2025. "Decomposing Inequalities using Machine Learning and Overcoming Common Support Issues," Papers 2511.13433, arXiv.org.
    13. Lin, Zhexiao & Han, Fang, 2025. "On regression-adjusted imputation estimators of average treatment effects," Journal of Econometrics, Elsevier, vol. 251(C).
    14. Byron Botha & Rulof Burger & Kevin Kotzé & Neil Rankin & Daan Steenkamp, 2023. "Big data forecasting of South African inflation," Empirical Economics, Springer, vol. 65(1), pages 149-188, July.
    15. Martin Huber, 2024. "An introduction to causal discovery," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 160(1), pages 1-16, December.
    16. Mogstad, Magne & Torgovitsky, Alexander, 2024. "Instrumental variables with unobserved heterogeneity in treatment effects," Handbook of Labor Economics,, Elsevier.
    17. Francesco Decarolis & Cristina Giorgiantonio, 2020. "Corruption red flags in public procurement: new evidence from Italian calls for tenders," Questioni di Economia e Finanza (Occasional Papers) 544, Bank of Italy, Economic Research and International Relations Area.
    18. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    19. Zhexiao Lin & Fang Han, 2022. "On regression-adjusted imputation estimators of the average treatment effect," Papers 2212.05424, arXiv.org, revised Jan 2023.
    20. Achim Ahrens & Christian B. Hansen & Mark E. Schaffer & Thomas Wiemann, 2025. "Model Averaging and Double Machine Learning," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 40(3), pages 249-269, April.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    JEL classification:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:causin:v:11:y:2023:i:1:p:12:n:1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyterbrill.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.