IDEAS home Printed from https://ideas.repec.org/p/osf/osfxxx/yve6u.html
   My bibliography  Save this paper

Data-driven Covariate Selection for Confounding Adjustment by Focusing on the Stability of the Effect Estimator

Author

Listed:
  • Loh, Wen Wei
  • Ren, Dongning

Abstract

Valid inference of cause-and-effect relations in observational studies necessitates adjusting for common causes of the focal predictor (i.e., treatment) and the outcome. When such common causes, henceforth termed confounders, remain unadjusted for, they generate spurious correlations that lead to biased causal effect estimates. But routine adjustment for all available covariates, when only a subset are truly confounders, is known to yield potentially inefficient and unstable estimators. In this article, we introduce a data-driven confounder selection strategy that focuses on stable estimation of the treatment effect. The approach exploits the causal knowledge that after adjusting for confounders to eliminate all confounding biases, adding any remaining non-confounding covariates associated with only treatment or outcome, but not both, should not systematically change the effect estimator. The strategy proceeds in two steps. First, we prioritize covariates for adjustment by probing how strongly each covariate is associated with treatment and outcome. Next, we gauge the stability of the effect estimator by evaluating its trajectory adjusting for different covariate subsets. The smallest subset that yields a stable effect estimate is then selected. Thus, the strategy offers direct insight into the (in)sensitivity of the effect estimator to the chosen covariates for adjustment. The ability to correctly select confounders and yield valid causal inference following data-driven covariate selection is evaluated empirically using extensive simulation studies. Furthermore, we compare the proposed method empirically with routine variable selection methods. Finally, we demonstrate the procedure using two publicly available real-world datasets.

Suggested Citation

  • Loh, Wen Wei & Ren, Dongning, 2021. "Data-driven Covariate Selection for Confounding Adjustment by Focusing on the Stability of the Effect Estimator," OSF Preprints yve6u, Center for Open Science.
  • Handle: RePEc:osf:osfxxx:yve6u
    DOI: 10.31219/osf.io/yve6u
    as

    Download full text from publisher

    File URL: https://osf.io/download/614d89c3687972000d8797d3/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/yve6u?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2011. "Inference on Treatment Effects After Selection Amongst High-Dimensional Controls," Papers 1201.0224, arXiv.org, revised May 2012.
    2. Brookhart, M. Alan & van der Laan, Mark J., 2006. "A semiparametric model selection criterion with applications to the marginal structural model," Computational Statistics & Data Analysis, Elsevier, vol. 50(2), pages 475-498, January.
    3. Heejung Bang & James M. Robins, 2005. "Doubly Robust Estimation in Missing Data and Causal Inference Models," Biometrics, The International Biometric Society, vol. 61(4), pages 962-973, December.
    4. Ben B. Hansen, 2004. "Full Matching in an Observational Study of Coaching for the SAT," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 609-618, January.
    5. Daniel Vaughan-Whitehead, 2016. "Introduction," Economia & lavoro, Carocci editore, issue 2, pages 7-12.
    6. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    7. Luke Keele & Dylan S. Small, 2021. "Comparing Covariate Prioritization via Matching to Machine Learning Methods for Causal Inference Using Five Empirical Applications," The American Statistician, Taylor & Francis Journals, vol. 75(4), pages 355-363, October.
    8. Po-Hsien Huang & Hung Chen & Li-Jen Weng, 2017. "A Penalized Likelihood Method for Structural Equation Modeling," Psychometrika, Springer;The Psychometric Society, vol. 82(2), pages 329-354, June.
    9. Glynn, Adam N. & Quinn, Kevin M., 2010. "An Introduction to the Augmented Inverse Propensity Weighted Estimator," Political Analysis, Cambridge University Press, vol. 18(1), pages 36-56, January.
    10. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Huber, Martin & Lechner, Michael & Wunsch, Conny, 2013. "The performance of estimators based on the propensity score," Journal of Econometrics, Elsevier, vol. 175(1), pages 1-21.
    2. Ruoxuan Xiong & Allison Koenecke & Michael Powell & Zhu Shen & Joshua T. Vogelstein & Susan Athey, 2021. "Federated Causal Inference in Heterogeneous Observational Data," Papers 2107.11732, arXiv.org, revised Apr 2023.
    3. Susan Athey & Guido W. Imbens & Stefan Wager, 2018. "Approximate residual balancing: debiased inference of average treatment effects in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 597-623, September.
    4. Sung Jae Jun & Sokbae Lee, 2020. "Causal Inference under Outcome-Based Sampling with Monotonicity Assumptions," Papers 2004.08318, arXiv.org, revised Oct 2023.
    5. Rina Friedberg & Julie Tibshirani & Susan Athey & Stefan Wager, 2018. "Local Linear Forests," Papers 1807.11408, arXiv.org, revised Sep 2020.
    6. Koch, Bernard & Sainburg, Tim & Geraldo, Pablo & JIANG, SONG & Sun, Yizhou & Foster, Jacob G., 2021. "Deep Learning of Potential Outcomes," SocArXiv aeszf, Center for Open Science.
    7. Yiyi Huo & Yingying Fan & Fang Han, 2023. "On the adaptation of causal forests to manifold data," Papers 2311.16486, arXiv.org, revised Dec 2023.
    8. Konan Alain N'Ghauran & Corinne Autant-Bernard, 2020. "Assessing the collaboration and network additionality of innovation policies: a counterfactual approach to the French cluster policy," Post-Print halshs-03128972, HAL.
    9. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "High-Dimensional Methods and Inference on Structural and Treatment Effects," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 29-50, Spring.
    10. Alberto Abadie & Anish Agarwal & Raaz Dwivedi & Abhin Shah, 2024. "Doubly Robust Inference in Causal Latent Factor Models," Papers 2402.11652, arXiv.org, revised Apr 2024.
    11. Siying Guo & Jianxuan Liu & Qiu Wang, 2022. "Effective Learning During COVID-19: Multilevel Covariates Matching and Propensity Score Matching," Annals of Data Science, Springer, vol. 9(5), pages 967-982, October.
    12. Graham, Bryan S. & Pinto, Cristine Campos de Xavier, 2022. "Semiparametrically efficient estimation of the average linear regression function," Journal of Econometrics, Elsevier, vol. 226(1), pages 115-138.
    13. Michael J. Weir & Thomas W. Sproul, 2019. "Identifying Drivers of Genetically Modified Seafood Demand: Evidence from a Choice Experiment," Sustainability, MDPI, vol. 11(14), pages 1-21, July.
    14. Peter Bühlmann & Domagoj Ćevid, 2020. "Deconfounding and Causal Regularisation for Stability and External Validity," International Statistical Review, International Statistical Institute, vol. 88(S1), pages 114-134, December.
    15. Sung Jae Jun & Sokbae (Simon) Lee, 2020. "Causal inference in case-control studies," CeMMAP working papers CWP19/20, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    16. Michael Schomaker & Christian Heumann, 2020. "When and when not to use optimal model averaging," Statistical Papers, Springer, vol. 61(5), pages 2221-2240, October.
    17. Donna Feir & Kelly Foley & Maggie E. C. Jones, 2021. "The Distributional Impacts of Active Labor Market Programs for Indigenous Populations," AEA Papers and Proceedings, American Economic Association, vol. 111, pages 216-220, May.
    18. Difang Huang & Jiti Gao & Tatsushi Oka, 2022. "Semiparametric Single-Index Estimation for Average Treatment Effects," Papers 2206.08503, arXiv.org, revised Oct 2022.
    19. Guido W. Imbens, 2015. "Matching Methods in Practice: Three Examples," Journal of Human Resources, University of Wisconsin Press, vol. 50(2), pages 373-419.
    20. Dmitry Arkhangelsky & Guido Imbens, 2023. "Causal Models for Longitudinal and Panel Data: A Survey," Papers 2311.15458, arXiv.org, revised Mar 2024.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:osfxxx:yve6u. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://osf.io/preprints/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.