IDEAS home Printed from https://ideas.repec.org/a/plo/pdig00/0000640.html
   My bibliography  Save this article

Learning and diSentangling patient static information from time-series Electronic hEalth Records (STEER)

Author

Listed:
  • Wei Liao
  • Joel Voldman

Abstract

Recent work in machine learning for healthcare has raised concerns about patient privacy and algorithmic fairness. Previous work has shown that self-reported race can be predicted from medical data that does not explicitly contain racial information. However, the extent of data identification is unknown, and we lack ways to develop models whose outcomes are minimally affected by such information. Here we systematically investigated the ability of time-series electronic health record data to predict patient static information. We found that not only the raw time-series data, but also learned representations from machine learning models, can be trained to predict a variety of static information with area under the receiver operating characteristic curve as high as 0.851 for biological sex, 0.869 for binarized age and 0.810 for self-reported race. Such high predictive performance can be extended to various comorbidity factors and exists even when the model was trained for different tasks, using different cohorts, using different model architectures and databases. Given the privacy and fairness concerns these findings pose, we develop a variational autoencoder-based approach that learns a structured latent space to disentangle patient-sensitive attributes from time-series data. Our work thoroughly investigates the ability of machine learning models to encode patient static information from time-series electronic health records and introduces a general approach to protect patient-sensitive information for downstream tasks.Author summary: It is increasingly apparent that machine learning for healthcare models can predict sensitive information from data that does not explicitly encode it. Well-known examples include self-reported race from various medical imaging modalities, and age and biological sex from retinal fundus images. These findings in turn raise concerns about introducing biases in models or exacerbating health disparities. However, we lack a clear understanding of the extent of the problem—what types of sensitive information can be predicted, how does it generalize to different models or different datasets—and, critically, approaches to develop models that can make clinical inferences but not infer sensitive information. Here we go beyond these prior studies and thoroughly investigate the ability of machine learning (ML) models to encode a wide range of patient sensitive information from time-series EHR data, and then, critically, provide a strategy to mitigate such inferences.

Suggested Citation

  • Wei Liao & Joel Voldman, 2024. "Learning and diSentangling patient static information from time-series Electronic hEalth Records (STEER)," PLOS Digital Health, Public Library of Science, vol. 3(10), pages 1-18, October.
  • Handle: RePEc:plo:pdig00:0000640
    DOI: 10.1371/journal.pdig.0000640
    as

    Download full text from publisher

    File URL: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000640
    Download Restriction: no

    File URL: https://journals.plos.org/digitalhealth/article/file?id=10.1371/journal.pdig.0000640&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pdig.0000640?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Daniel C. Castro & Ian Walker & Ben Glocker, 2020. "Causality matters in medical imaging," Nature Communications, Nature, vol. 11(1), pages 1-10, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sujin Park & Ali Tafti & Galit Shmueli, 2024. "Transporting Causal Effects Across Populations Using Structural Causal Modeling: An Illustration to Work-from-Home Productivity," Information Systems Research, INFORMS, vol. 35(2), pages 686-705, June.
    2. Zheng, Shuwen & Wang, Chong & Zio, Enrico & Liu, Jie, 2024. "Fault detection in complex mechatronic systems by a hierarchical graph convolution attention network based on causal paths," Reliability Engineering and System Safety, Elsevier, vol. 243(C).
    3. Jiahui Liu & Keqiang Fan & Xiaohao Cai & Mahesan Niranjan, 2024. "Few-shot learning for inference in medical imaging with subspace feature representations," PLOS ONE, Public Library of Science, vol. 19(11), pages 1-23, November.
    4. Aly A Valliani & Faris F Gulamali & Young Joon Kwon & Michael L Martini & Chiatse Wang & Douglas Kondziolka & Viola J Chen & Weichung Wang & Anthony B Costa & Eric K Oermann, 2022. "Deploying deep learning models on unseen medical imaging using adversarial domain adaptation," PLOS ONE, Public Library of Science, vol. 17(10), pages 1-17, October.
    5. Mélanie Roschewitz & Galvin Khara & Joe Yearsley & Nisha Sharma & Jonathan J. James & Éva Ambrózay & Adam Heroux & Peter Kecskemethy & Tobias Rijken & Ben Glocker, 2023. "Automatic correction of performance drift under acquisition shift in medical image classification," Nature Communications, Nature, vol. 14(1), pages 1-10, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000640. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.