IDEAS home Printed from https://ideas.repec.org/p/ecl/stabus/3950.html
   My bibliography  Save this paper

Uncovering Interpretable Potential Confounders in Electronic Medical Records

Author

Listed:
  • Zeng, Jiaming

    (Stanford University)

  • Gensheimer, Michael F.

    (Stanford University)

  • Rubin, Daniel L.

    (Stanford University)

  • Athey, Susan

    (Stanford University)

  • Schachter, Ross D.

    (Stanford University)

Abstract

In medicine, randomized clinical trials are the gold standard for informing treatment decisions. Observational comparative effectiveness research is often plagued by selection bias, and expert-selected covariates may not be sufficient to adjust for confounding. We explore how the unstructured clinical text in electronic medical records can be used to reduce selection bias and improve medical practice. We develop a method based on natural language processing to uncover interpretable potential confounders from the clinical text. We validate our method by comparing the hazard ratio (HR) from survival analysis with and without the confounders against the results from established RCTs. We apply our method to four study cohorts built from localized prostate and lung cancer datasets from the Stanford Cancer Institute Research Database and show that our method adjusts the HR estimate towards the RCT results. We further confirm that the uncovered terms can be interpreted by an oncologist as potential confounders. This research helps enable more credible causal inference using data from EMRs, offers a transparent way to improve the design of observational CER, and could inform high-stake medical decisions. Our method can also be applied to studies within and beyond medicine to extract important information from observational data to support decisions.

Suggested Citation

  • Zeng, Jiaming & Gensheimer, Michael F. & Rubin, Daniel L. & Athey, Susan & Schachter, Ross D., 2021. "Uncovering Interpretable Potential Confounders in Electronic Medical Records," Research Papers 3950, Stanford University, Graduate School of Business.
  • Handle: RePEc:ecl:stabus:3950
    as

    Download full text from publisher

    File URL: https://www.gsb.stanford.edu/faculty-research/working-papers/uncovering-interpretable-potential-confounders-electronic-medical
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    2. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "High-Dimensional Methods and Inference on Structural and Treatment Effects," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 29-50, Spring.
    3. Mozer, Reagan & Miratrix, Luke & Kaufman, Aaron Russell & Jason Anastasopoulos, L., 2020. "Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality," Political Analysis, Cambridge University Press, vol. 28(4), pages 445-468, October.
    4. Rajeev H. Dehejia & Sadek Wahba, 2002. "Propensity Score-Matching Methods For Nonexperimental Causal Studies," The Review of Economics and Statistics, MIT Press, vol. 84(1), pages 151-161, February.
    5. Abadie, Alberto & Imbens, Guido W., 2011. "Bias-Corrected Matching Estimators for Average Treatment Effects," Journal of Business & Economic Statistics, American Statistical Association, vol. 29(1), pages 1-11.
    6. Margaret E. Roberts & Brandon M. Stewart & Richard A. Nielsen, 2020. "Adjusting for Confounding with Text Matching," American Journal of Political Science, John Wiley & Sons, vol. 64(4), pages 887-903, October.
    7. Friedman, Jerome H., 2002. "Stochastic gradient boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 367-378, February.
    8. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881.
    9. Ho, Daniel & Imai, Kosuke & King, Gary & Stuart, Elizabeth A., 2011. "MatchIt: Nonparametric Preprocessing for Parametric Causal Inference," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 42(i08).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Takanobu Hirosawa & Yukinori Harada & Masashi Yokose & Tetsu Sakamoto & Ren Kawamura & Taro Shimizu, 2023. "Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study," IJERPH, MDPI, vol. 20(4), pages 1-10, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    2. Susan Athey & Guido W. Imbens & Jonas Metzger & Evan M. Munro, 2019. "Using Wasserstein Generative Adversarial Networks for the Design of Monte Carlo Simulations," NBER Working Papers 26566, National Bureau of Economic Research, Inc.
    3. Davide Viviano & Jelena Bradic, 2019. "Synthetic learner: model-free inference on treatments over time," Papers 1904.01490, arXiv.org, revised Aug 2022.
    4. Sallin, Aurelién, 2021. "Estimating returns to special education: combining machine learning and text analysis to address confounding," Economics Working Paper Series 2109, University of St. Gallen, School of Economics and Political Science.
    5. Pedro Carneiro & Sokbae Lee & Daniel Wilhelm, 2020. "Optimal data collection for randomized control trials [Microcredit impacts: Evidence from a randomized microcredit program placement experiment by Compartamos Banco]," The Econometrics Journal, Royal Economic Society, vol. 23(1), pages 1-31.
    6. Michael C. Knaus, 2021. "A double machine learning approach to estimate the effects of musical practice on student’s skills," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 282-300, January.
    7. Yiyi Huo & Yingying Fan & Fang Han, 2023. "On the adaptation of causal forests to manifold data," Papers 2311.16486, arXiv.org, revised Dec 2023.
    8. Guido W. Imbens, 2020. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 1129-1179, December.
    9. Dmitry Arkhangelsky & Guido Imbens, 2023. "Causal Models for Longitudinal and Panel Data: A Survey," Papers 2311.15458, arXiv.org, revised Mar 2024.
    10. Goller, Daniel & Lechner, Michael & Moczall, Andreas & Wolff, Joachim, 2020. "Does the estimation of the propensity score by machine learning improve matching estimation? The case of Germany's programmes for long term unemployed," Labour Economics, Elsevier, vol. 65(C).
    11. Koch, Nicolas & Basse Mama, Houdou, 2019. "Does the EU Emissions Trading System induce investment leakage? Evidence from German multinational firms," Energy Economics, Elsevier, vol. 81(C), pages 479-492.
    12. Huang, Wei & Li, Jinxian & Zhang, Qiang, 2019. "Information asymmetry, legal environment, and family firm governance: Evidence from IPO underpricing in China," Pacific-Basin Finance Journal, Elsevier, vol. 57(C).
    13. Aur'elien Sallin, 2021. "Estimating returns to special education: combining machine learning and text analysis to address confounding," Papers 2110.08807, arXiv.org, revised Feb 2022.
    14. Ferman, Bruno, 2021. "Matching estimators with few treated and many control observations," Journal of Econometrics, Elsevier, vol. 225(2), pages 295-307.
    15. Harsh Parikh & Cynthia Rudin & Alexander Volfovsky, 2018. "MALTS: Matching After Learning to Stretch," Papers 1811.07415, arXiv.org, revised Jun 2023.
    16. Harsh Parikh & Carlos Varjao & Louise Xu & Eric Tchetgen Tchetgen, 2022. "Validating Causal Inference Methods," Papers 2202.04208, arXiv.org, revised Jul 2022.
    17. Goenner, Cullen F, 2016. "The policy impact of new rules for loan participation on credit union returns," Journal of Banking & Finance, Elsevier, vol. 73(C), pages 198-210.
    18. Davide Viviano, 2019. "Policy Targeting under Network Interference," Papers 1906.10258, arXiv.org, revised Jan 2023.
    19. Zhexiao Lin & Fang Han, 2022. "On regression-adjusted imputation estimators of the average treatment effect," Papers 2212.05424, arXiv.org, revised Jan 2023.
    20. Zhexiao Lin & Peng Ding & Fang Han, 2021. "Estimation based on nearest neighbor matching: from density ratio to average treatment effect," Papers 2112.13506, arXiv.org.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ecl:stabus:3950. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/gsstaus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.