IDEAS home Printed from https://ideas.repec.org/a/sae/jedbes/v45y2020i2p119-142.html
   My bibliography  Save this article

Variable Selection for Causal Effect Estimation: Nonparametric Conditional Independence Testing With Random Forests

Author

Listed:
  • Bryan Keller

    (Teachers College, Columbia University)

Abstract

Widespread availability of rich educational databases facilitates the use of conditioning strategies to estimate causal effects with nonexperimental data. With dozens, hundreds, or more potential predictors, variable selection can be useful for practical reasons related to communicating results and for statistical reasons related to improving the efficiency of estimators. Background knowledge should take precedence in deciding which variables to retain. However, with many potential predictors, theory may be weak, such that functional form relationships are likely to be unknown. In this article, I propose a nonparametric method for data-driven variable selection based on permutation testing with conditional random forest variable importance. The algorithm automatically handles nonlinear relationships and interactions in its naive implementation. Through a series of Monte Carlo simulation studies and a case study with Early Childhood Longitudinal Study–K data, I find that the method performs well across a variety of scenarios where other methods fail.

Suggested Citation

  • Bryan Keller, 2020. "Variable Selection for Causal Effect Estimation: Nonparametric Conditional Independence Testing With Random Forests," Journal of Educational and Behavioral Statistics, , vol. 45(2), pages 119-142, April.
  • Handle: RePEc:sae:jedbes:v:45:y:2020:i:2:p:119-142
    DOI: 10.3102/1076998619872001
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.3102/1076998619872001
    Download Restriction: no

    File URL: https://libkey.io/10.3102/1076998619872001?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Persson, Emma & Häggström, Jenny & Waernbaum, Ingeborg & de Luna, Xavier, 2017. "Data-driven algorithms for dimension reduction in causal inference," Computational Statistics & Data Analysis, Elsevier, vol. 105(C), pages 280-292.
    2. Jay Bhattacharya & William B. Vogt, 2007. "Do Instrumental Variables Belong in Propensity Scores?," NBER Technical Working Papers 0343, National Bureau of Economic Research, Inc.
    3. Peter M. Steiner & Thomas D. Cook & William R. Shadish, 2011. "On the Importance of Reliable Covariate Measurement in Selection Bias Adjustments Using Propensity Scores," Journal of Educational and Behavioral Statistics, , vol. 36(2), pages 213-236, April.
    4. Häggström, Jenny & Persson, Emma & Waernbaum, Ingeborg & de Luna, Xavier, 2015. "CovSel: An R Package for Covariate Selection When Estimating Average Causal Effects," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 68(i01).
    5. Jinyong Hahn, 2004. "Functional Restriction and Efficiency in Causal Inference," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 73-76, February.
    6. Lumley, Thomas, 2004. "Analysis of Complex Survey Samples," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 9(i08).
    7. Kosuke Imai & Marc Ratkovic, 2014. "Covariate balancing propensity score," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 243-263, January.
    8. Hapfelmeier, A. & Ulm, K., 2013. "A new variable selection approach using Random Forests," Computational Statistics & Data Analysis, Elsevier, vol. 60(C), pages 50-69.
    9. Gruber, Susan & Laan, Mark van der, 2012. "tmle: An R Package for Targeted Maximum Likelihood Estimation," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 51(i13).
    10. van der Laan Mark J. & Gruber Susan, 2010. "Collaborative Double Robust Targeted Maximum Likelihood Estimation," The International Journal of Biostatistics, De Gruyter, vol. 6(1), pages 1-71, May.
    11. Scutari, Marco, 2010. "Learning Bayesian Networks with the bnlearn R Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 35(i03).
    12. Xavier De Luna & Ingeborg Waernbaum & Thomas S. Richardson, 2011. "Covariate selection for the nonparametric estimation of an average treatment effect," Biometrika, Biometrika Trust, vol. 98(4), pages 861-875.
    13. Kapelner, Adam & Bleich, Justin, 2016. "bartMachine: Machine Learning with Bayesian Additive Regression Trees," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i04).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Agboola, Oluwagbenga David & Yu, Han, 2023. "Neighborhood-based cross fitting approach to treatment effects with high-dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 186(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jenny Häggström, 2018. "Data†driven confounder selection via Markov and Bayesian networks," Biometrics, The International Biometric Society, vol. 74(2), pages 389-398, June.
    2. Persson, Emma & Häggström, Jenny & Waernbaum, Ingeborg & de Luna, Xavier, 2017. "Data-driven algorithms for dimension reduction in causal inference," Computational Statistics & Data Analysis, Elsevier, vol. 105(C), pages 280-292.
    3. David Cheng & Abhishek Chakrabortty & Ashwin N. Ananthakrishnan & Tianxi Cai, 2020. "Estimating average treatment effects with a double‐index propensity score," Biometrics, The International Biometric Society, vol. 76(3), pages 767-777, September.
    4. Dingke Tang & Dehan Kong & Wenliang Pan & Linbo Wang, 2023. "Ultra‐high dimensional variable selection for doubly robust causal inference," Biometrics, The International Biometric Society, vol. 79(2), pages 903-914, June.
    5. Häggström, Jenny & Persson, Emma & Waernbaum, Ingeborg & de Luna, Xavier, 2015. "CovSel: An R Package for Covariate Selection When Estimating Average Causal Effects," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 68(i01).
    6. Matthew Cefalu & Francesca Dominici & Nils Arvold & Giovanni Parmigiani, 2017. "Model averaged double robust estimation," Biometrics, The International Biometric Society, vol. 73(2), pages 410-421, June.
    7. Uehleke, Reinhard & Petrick, Martin & Hüttel, Silke, 2022. "Evaluations of agri-environmental schemes based on observational farm data: The importance of covariate selection," Land Use Policy, Elsevier, vol. 114(C).
    8. Xun Lu, 2015. "A Covariate Selection Criterion for Estimation of Treatment Effects," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 33(4), pages 506-522, October.
    9. Agboola, Oluwagbenga David & Yu, Han, 2023. "Neighborhood-based cross fitting approach to treatment effects with high-dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 186(C).
    10. Wei Luo & Yeying Zhu & Debashis Ghosh, 2017. "On estimating regression-based causal effects using sufficient dimension reduction," Biometrika, Biometrika Trust, vol. 104(1), pages 51-65.
    11. Edward H. Kennedy & Sivaraman Balakrishnan, 2018. "Discussion of “Data†driven confounder selection via Markov and Bayesian networks†by Jenny Häggström," Biometrics, The International Biometric Society, vol. 74(2), pages 399-402, June.
    12. Joseph Antonelli & Matthew Cefalu & Nathan Palmer & Denis Agniel, 2018. "Doubly robust matching estimators for high dimensional confounding adjustment," Biometrics, The International Biometric Society, vol. 74(4), pages 1171-1179, December.
    13. Yi-Sheng Chao & Marco Scutari & Tai-Shen Chen & Chao-Jung Wu & Madeleine Durand & Antoine Boivin & Hsing-Chien Wu & Wei-Chih Chen, 2018. "A network perspective of engaging patients in specialist and chronic illness care: The 2014 International Health Policy Survey," PLOS ONE, Public Library of Science, vol. 13(8), pages 1-21, August.
    14. Jianxuan Liu & Yanyuan Ma & Lan Wang, 2018. "An alternative robust estimator of average treatment effect in causal inference," Biometrics, The International Biometric Society, vol. 74(3), pages 910-923, September.
    15. Thomas S. Richardson & James M. Robins & Linbo Wang, 2018. "Discussion of “Data†driven confounder selection via Markov and Bayesian networks†by Häggström," Biometrics, The International Biometric Society, vol. 74(2), pages 403-406, June.
    16. Susan M. Shortreed & Ashkan Ertefaie, 2017. "Outcome‐adaptive lasso: Variable selection for causal inference," Biometrics, The International Biometric Society, vol. 73(4), pages 1111-1122, December.
    17. Brandon Koch & David M. Vock & Julian Wolfson, 2018. "Covariate selection with group lasso and doubly robust estimation of causal effects," Biometrics, The International Biometric Society, vol. 74(1), pages 8-17, March.
    18. Huaiyu Zang & Hang J. Kim & Bin Huang & Rhonda Szczesniak, 2023. "Bayesian causal inference for observational studies with missingness in covariates and outcomes," Biometrics, The International Biometric Society, vol. 79(4), pages 3624-3636, December.
    19. Pingel, Ronnie & Waernbaum, Ingeborg, 2015. "Correlation and efficiency of propensity score-based estimators for average causal effects," Working Paper Series 2015:3, IFAU - Institute for Evaluation of Labour Market and Education Policy.
    20. Sean Yiu & Li Su, 2018. "Covariate association eliminating weights: a unified weighting framework for causal effect estimation," Biometrika, Biometrika Trust, vol. 105(3), pages 709-722.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:jedbes:v:45:y:2020:i:2:p:119-142. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.