IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-56625-z.html
   My bibliography  Save this article

Deep representation learning for clustering longitudinal survival data from electronic health records

Author

Listed:
  • Jiajun Qiu

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Yao Hu

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Li Li

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Abdullah Mesut Erzurumluoglu

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Ingrid Braenne

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Charles Whitehurst

    (Boehringer-Ingelheim)

  • Jochen Schmitz

    (Boehringer-Ingelheim)

  • Jatin Arora

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Boris Alexander Bartholdy

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Shrey Gandhi

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Pierre Khoueiry

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Stefanie Mueller

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Boris Noyvert

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Zhihao Ding

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Jan Nygaard Jensen

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Johann Jong

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

Abstract

Precision medicine requires accurate identification of clinically relevant patient subgroups. Electronic health records provide major opportunities for leveraging machine learning approaches to uncover novel patient subgroups. However, many existing approaches fail to adequately capture complex interactions between diagnosis trajectories and disease-relevant risk events, leading to subgroups that can still display great heterogeneity in event risk and underlying molecular mechanisms. To address this challenge, we implemented VaDeSC-EHR, a transformer-based variational autoencoder for clustering longitudinal survival data as extracted from electronic health records. We show that VaDeSC-EHR outperforms baseline methods on both synthetic and real-world benchmark datasets with known ground-truth cluster labels. In an application to Crohn’s disease, VaDeSC-EHR successfully identifies four distinct subgroups with divergent diagnosis trajectories and risk profiles, revealing clinically and genetically relevant factors in Crohn’s disease. Our results show that VaDeSC-EHR can be a powerful tool for discovering novel patient subgroups in the development of precision medicine approaches.

Suggested Citation

  • Jiajun Qiu & Yao Hu & Li Li & Abdullah Mesut Erzurumluoglu & Ingrid Braenne & Charles Whitehurst & Jochen Schmitz & Jatin Arora & Boris Alexander Bartholdy & Shrey Gandhi & Pierre Khoueiry & Stefanie , 2025. "Deep representation learning for clustering longitudinal survival data from electronic health records," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-56625-z
    DOI: 10.1038/s41467-025-56625-z
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-56625-z
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-56625-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. repec:plo:pbio00:0020108 is not listed on IDEAS
    2. Na You & Shun He & Xueqin Wang & Junxian Zhu & Heping Zhang, 2018. "Subtype classification and heterogeneous prognosis model construction in precision medicine," Biometrics, The International Biometric Society, vol. 74(3), pages 814-822, September.
    3. H. W. Kuhn, 1955. "The Hungarian method for the assignment problem," Naval Research Logistics Quarterly, John Wiley & Sons, vol. 2(1‐2), pages 83-97, March.
    4. James Whitehorn & Tran Nguyen Bich Chau & Nguyen Minh Nguyet & Duong Thi Hue Kien & Nguyen Than Ha Quyen & Dinh The Trung & Junxiong Pang & Bridget Wills & Nguyen Van Vinh Chau & Jeremy Farrar & Marti, 2013. "Genetic Variants of MICB and PLCE1 and Associations with Non-Severe Dengue," PLOS ONE, Public Library of Science, vol. 8(3), pages 1-7, March.
    5. Cathie Sudlow & John Gallacher & Naomi Allen & Valerie Beral & Paul Burton & John Danesh & Paul Downey & Paul Elliott & Jane Green & Martin Landray & Bette Liu & Paul Matthews & Giok Ong & Jill Pell &, 2015. "UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age," PLOS Medicine, Public Library of Science, vol. 12(3), pages 1-10, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. András Frank, 2005. "On Kuhn's Hungarian Method—A tribute from Hungary," Naval Research Logistics (NRL), John Wiley & Sons, vol. 52(1), pages 2-5, February.
    2. Amit Kumar & Anila Gupta, 2013. "Mehar’s methods for fuzzy assignment problems with restrictions," Fuzzy Information and Engineering, Springer, vol. 5(1), pages 27-44, March.
    3. Jujiao Kang & Yue-Ting Deng & Bang-Sheng Wu & Wei-Shi Liu & Ze-Yu Li & Shitong Xiang & Liu Yang & Jia You & Xiaohong Gong & Tianye Jia & Jin-Tai Yu & Wei Cheng & Jianfeng Feng, 2024. "Whole exome sequencing analysis identifies genes for alcohol consumption," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    4. Parvin Ahmadi & Iman Gholampour & Mahmoud Tabandeh, 2018. "Cluster-based sparse topical coding for topic mining and document clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(3), pages 537-558, September.
    5. Leslie A. Smith & James A. Cahill & Ji-Hyun Lee & Kiley Graim, 2025. "Equitable machine learning counteracts ancestral bias in precision medicine," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
    6. Chenchen Ma & Jing Ouyang & Gongjun Xu, 2023. "Learning Latent and Hierarchical Structures in Cognitive Diagnosis Models," Psychometrika, Springer;The Psychometric Society, vol. 88(1), pages 175-207, March.
    7. Aidin Rezaeian & Hamidreza Koosha & Mohammad Ranjbar & Saeed Poormoaied, 2024. "The assignment of project managers to projects in an uncertain dynamic environment," Annals of Operations Research, Springer, vol. 341(2), pages 1107-1134, October.
    8. Tran Hoang Hai, 2020. "Estimation of volatility causality in structural autoregressions with heteroskedasticity using independent component analysis," Statistical Papers, Springer, vol. 61(1), pages 1-16, February.
    9. Samvida S. Venkatesh & Habib Ganjgahi & Duncan S. Palmer & Kayesha Coley & Gregorio V. Linchangco & Qin Hui & Peter Wilson & Yuk-Lam Ho & Kelly Cho & Kadri Arumäe & Laura B. L. Wittemans & Christoffer, 2024. "Characterising the genetic architecture of changes in adiposity during adulthood using electronic health records," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    10. Caplin, Andrew & Leahy, John, 2020. "Comparative statics in markets for indivisible goods," Journal of Mathematical Economics, Elsevier, vol. 90(C), pages 80-94.
    11. Biró, Péter & Gudmundsson, Jens, 2021. "Complexity of finding Pareto-efficient allocations of highest welfare," European Journal of Operational Research, Elsevier, vol. 291(2), pages 614-628.
    12. Ana Viana & Xenia Klimentova & Péter Biró & Flip Klijn, 2021. "Shapley-Scarf Housing Markets: Respecting Improvement, Integer Programming, and Kidney Exchange," Working Papers 1235, Barcelona School of Economics.
    13. Michal Brylinski, 2014. "eMatchSite: Sequence Order-Independent Structure Alignments of Ligand Binding Pockets in Protein Models," PLOS Computational Biology, Public Library of Science, vol. 10(9), pages 1-15, September.
    14. Juan G. Carvajal-Patiño & Vincent Mallet & David Becerra & Luis Fernando Niño Vasquez & Carlos Oliver & Jérôme Waldispühl, 2025. "RNAmigos2: accelerated structure-based RNA virtual screening with deep graph learning," Nature Communications, Nature, vol. 16(1), pages 1-12, December.
    15. Chiwei Yan & Helin Zhu & Nikita Korolko & Dawn Woodard, 2020. "Dynamic pricing and matching in ride‐hailing platforms," Naval Research Logistics (NRL), John Wiley & Sons, vol. 67(8), pages 705-724, December.
    16. Fanrong Xie & Anuj Sharma & Zuoan Li, 2022. "An alternate approach to solve two-level priority based assignment problem," Computational Optimization and Applications, Springer, vol. 81(2), pages 613-656, March.
    17. Yan, Pengyu & Lee, Chung-Yee & Chu, Chengbin & Chen, Cynthia & Luo, Zhiqin, 2021. "Matching and pricing in ride-sharing: Optimality, stability, and financial sustainability," Omega, Elsevier, vol. 102(C).
    18. Morrill, Thayer & Roth, Alvin E., 2024. "Top trading cycles," Journal of Mathematical Economics, Elsevier, vol. 112(C).
    19. Bo Cowgill & Jonathan M. V. Davis & B. Pablo Montagnes & Patryk Perkowski, 2025. "Stable Matching on the Job? Theory and Evidence on Internal Talent Markets," Management Science, INFORMS, vol. 71(3), pages 2508-2526, March.
    20. Guo, Yuhan & Zhang, Yu & Boulaksil, Youssef, 2021. "Real-time ride-sharing framework with dynamic timeframe and anticipation-based migration," European Journal of Operational Research, Elsevier, vol. 288(3), pages 810-828.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-56625-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.