IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v79y2023i2p903-914.html
   My bibliography  Save this article

Ultra‐high dimensional variable selection for doubly robust causal inference

Author

Listed:
  • Dingke Tang
  • Dehan Kong
  • Wenliang Pan
  • Linbo Wang

Abstract

Causal inference has been increasingly reliant on observational studies with rich covariate information. To build tractable causal procedures, such as the doubly robust estimators, it is imperative to first extract important features from high or even ultra‐high dimensional data. In this paper, we propose causal ball screening for confounder selection from modern ultra‐high dimensional data sets. Unlike the familiar task of variable selection for prediction modeling, our confounder selection procedure aims to control for confounding while improving efficiency in the resulting causal effect estimate. Previous empirical and theoretical studies suggest excluding causes of the treatment that are not confounders. Motivated by these results, our goal is to keep all the predictors of the outcome in both the propensity score and outcome regression models. A distinctive feature of our proposal is that we use an outcome model‐free procedure for propensity score model selection, thereby maintaining double robustness in the resulting causal effect estimator. Our theoretical analyses show that the proposed procedure enjoys a number of properties, including model selection consistency and pointwise normality. Synthetic and real data analysis show that our proposal performs favorably with existing methods in a range of realistic settings. Data used in preparation of this paper were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.

Suggested Citation

  • Dingke Tang & Dehan Kong & Wenliang Pan & Linbo Wang, 2023. "Ultra‐high dimensional variable selection for doubly robust causal inference," Biometrics, The International Biometric Society, vol. 79(2), pages 903-914, June.
  • Handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:903-914
    DOI: 10.1111/biom.13625
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13625
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13625?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    3. Jinyong Hahn, 2004. "Functional Restriction and Efficiency in Causal Inference," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 73-76, February.
    4. Emre Barut & Jianqing Fan & Anneleen Verhasselt, 2016. "Conditional Sure Independence Screening," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 1266-1277, July.
    5. Farrell, Max H., 2015. "Robust inference on average treatment effects with possibly more covariates than observations," Journal of Econometrics, Elsevier, vol. 189(1), pages 1-23.
    6. Wenliang Pan & Xueqin Wang & Heping Zhang & Hongtu Zhu & Jin Zhu, 2020. "Ball Covariance: A Generic Measure of Dependence in Banach Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(529), pages 307-317, January.
    7. Leeb, Hannes & Pötscher, Benedikt M., 2005. "Model Selection And Inference: Facts And Fiction," Econometric Theory, Cambridge University Press, vol. 21(1), pages 21-59, February.
    8. Kosuke Imai & Marc Ratkovic, 2014. "Covariate balancing propensity score," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 243-263, January.
    9. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    10. Xueqin Wang & Wenliang Pan & Wenhao Hu & Yuan Tian & Heping Zhang, 2015. "Conditional Distance Correlation," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1726-1734, December.
    11. Xavier De Luna & Ingeborg Waernbaum & Thomas S. Richardson, 2011. "Covariate selection for the nonparametric estimation of an average treatment effect," Biometrika, Biometrika Trust, vol. 98(4), pages 861-875.
    12. Ander Wilson & Brian J. Reich, 2014. "Confounder selection via penalized credible regions," Biometrics, The International Biometric Society, vol. 70(4), pages 852-861, December.
    13. Linbo Wang & Eric Tchetgen Tchetgen, 2018. "Bounded, efficient and multiply robust estimation of average treatment effects using instrumental variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(3), pages 531-550, June.
    14. Jinyong Hahn, 1998. "On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects," Econometrica, Econometric Society, vol. 66(2), pages 315-332, March.
    15. Corwin Matthew Zigler & Francesca Dominici, 2014. "Uncertainty in Propensity Score Estimation: Bayesian Methods for Variable Selection and Model-Averaged Causal Effects," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 95-107, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Agboola, Oluwagbenga David & Yu, Han, 2023. "Neighborhood-based cross fitting approach to treatment effects with high-dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 186(C).
    2. Susan M. Shortreed & Ashkan Ertefaie, 2017. "Outcome‐adaptive lasso: Variable selection for causal inference," Biometrics, The International Biometric Society, vol. 73(4), pages 1111-1122, December.
    3. Joseph Antonelli & Matthew Cefalu & Nathan Palmer & Denis Agniel, 2018. "Doubly robust matching estimators for high dimensional confounding adjustment," Biometrics, The International Biometric Society, vol. 74(4), pages 1171-1179, December.
    4. Martin Huber, 2019. "An introduction to flexible methods for policy evaluation," Papers 1910.00641, arXiv.org.
    5. Michael C. Knaus, 2021. "A double machine learning approach to estimate the effects of musical practice on student’s skills," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 282-300, January.
    6. Chunrong Ai & Oliver Linton & Kaiji Motegi & Zheng Zhang, 2021. "A unified framework for efficient estimation of general treatment models," Quantitative Economics, Econometric Society, vol. 12(3), pages 779-816, July.
    7. Victor Chernozhukov & Juan Carlos Escanciano & Hidehiko Ichimura & Whitney K. Newey & James M. Robins, 2022. "Locally Robust Semiparametric Estimation," Econometrica, Econometric Society, vol. 90(4), pages 1501-1535, July.
    8. Xun Lu, 2015. "A Covariate Selection Criterion for Estimation of Treatment Effects," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 33(4), pages 506-522, October.
    9. David Cheng & Abhishek Chakrabortty & Ashwin N. Ananthakrishnan & Tianxi Cai, 2020. "Estimating average treatment effects with a double‐index propensity score," Biometrics, The International Biometric Society, vol. 76(3), pages 767-777, September.
    10. Farrell, Max H., 2015. "Robust inference on average treatment effects with possibly more covariates than observations," Journal of Econometrics, Elsevier, vol. 189(1), pages 1-23.
    11. Antonelli Joseph & Cefalu Matthew, 2020. "Averaging causal estimators in high dimensions," Journal of Causal Inference, De Gruyter, vol. 8(1), pages 92-107, January.
    12. Ricardo P. Masini & Marcelo C. Medeiros & Eduardo F. Mendes, 2023. "Machine learning advances for time series forecasting," Journal of Economic Surveys, Wiley Blackwell, vol. 37(1), pages 76-111, February.
    13. Matthew Cefalu & Francesca Dominici & Nils Arvold & Giovanni Parmigiani, 2017. "Model averaged double robust estimation," Biometrics, The International Biometric Society, vol. 73(2), pages 410-421, June.
    14. Michael Lechner & Jana Mareckova, 2024. "Comprehensive Causal Machine Learning," Papers 2405.10198, arXiv.org.
    15. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    16. Christis Katsouris, 2023. "High Dimensional Time Series Regression Models: Applications to Statistical Learning Methods," Papers 2308.16192, arXiv.org.
    17. Jun Lu & Lu Lin, 2020. "Model-free conditional screening via conditional distance correlation," Statistical Papers, Springer, vol. 61(1), pages 225-244, February.
    18. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    19. Sant’Anna, Pedro H.C. & Zhao, Jun, 2020. "Doubly robust difference-in-differences estimators," Journal of Econometrics, Elsevier, vol. 219(1), pages 101-122.
    20. Kitagawa, Toru & Muris, Chris, 2016. "Model averaging in semiparametric estimation of treatment effects," Journal of Econometrics, Elsevier, vol. 193(1), pages 271-289.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:903-914. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.