IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-56625-z.html
   My bibliography  Save this article

Deep representation learning for clustering longitudinal survival data from electronic health records

Author

Listed:
  • Jiajun Qiu

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Yao Hu

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Li Li

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Abdullah Mesut Erzurumluoglu

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Ingrid Braenne

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Charles Whitehurst

    (Boehringer-Ingelheim)

  • Jochen Schmitz

    (Boehringer-Ingelheim)

  • Jatin Arora

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Boris Alexander Bartholdy

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Shrey Gandhi

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Pierre Khoueiry

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Stefanie Mueller

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Boris Noyvert

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Zhihao Ding

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Jan Nygaard Jensen

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

  • Johann Jong

    (Boehringer Ingelheim Pharma GmbH & Co. KG)

Abstract

Precision medicine requires accurate identification of clinically relevant patient subgroups. Electronic health records provide major opportunities for leveraging machine learning approaches to uncover novel patient subgroups. However, many existing approaches fail to adequately capture complex interactions between diagnosis trajectories and disease-relevant risk events, leading to subgroups that can still display great heterogeneity in event risk and underlying molecular mechanisms. To address this challenge, we implemented VaDeSC-EHR, a transformer-based variational autoencoder for clustering longitudinal survival data as extracted from electronic health records. We show that VaDeSC-EHR outperforms baseline methods on both synthetic and real-world benchmark datasets with known ground-truth cluster labels. In an application to Crohn’s disease, VaDeSC-EHR successfully identifies four distinct subgroups with divergent diagnosis trajectories and risk profiles, revealing clinically and genetically relevant factors in Crohn’s disease. Our results show that VaDeSC-EHR can be a powerful tool for discovering novel patient subgroups in the development of precision medicine approaches.

Suggested Citation

  • Jiajun Qiu & Yao Hu & Li Li & Abdullah Mesut Erzurumluoglu & Ingrid Braenne & Charles Whitehurst & Jochen Schmitz & Jatin Arora & Boris Alexander Bartholdy & Shrey Gandhi & Pierre Khoueiry & Stefanie , 2025. "Deep representation learning for clustering longitudinal survival data from electronic health records," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-56625-z
    DOI: 10.1038/s41467-025-56625-z
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-56625-z
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-56625-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. repec:plo:pbio00:0020108 is not listed on IDEAS
    2. H. W. Kuhn, 1955. "The Hungarian method for the assignment problem," Naval Research Logistics Quarterly, John Wiley & Sons, vol. 2(1‐2), pages 83-97, March.
    3. repec:plo:pone00:0059067 is not listed on IDEAS
    4. repec:plo:pmed00:1001779 is not listed on IDEAS
    5. Na You & Shun He & Xueqin Wang & Junxian Zhu & Heping Zhang, 2018. "Subtype classification and heterogeneous prognosis model construction in precision medicine," Biometrics, The International Biometric Society, vol. 74(3), pages 814-822, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Weiqiang Shen & Chuanlin Zhang & Xiaona Zhang & Jinglun Shi, 2019. "A fully distributed deployment algorithm for underwater strong k-barrier coverage using mobile sensors," International Journal of Distributed Sensor Networks, , vol. 15(4), pages 15501477198, April.
    2. Bo Cowgill & Jonathan M. V. Davis & B. Pablo Montagnes & Patryk Perkowski, 2025. "Stable Matching on the Job? Theory and Evidence on Internal Talent Markets," Management Science, INFORMS, vol. 71(3), pages 2508-2526, March.
    3. András Frank, 2005. "On Kuhn's Hungarian Method—A tribute from Hungary," Naval Research Logistics (NRL), John Wiley & Sons, vol. 52(1), pages 2-5, February.
    4. Weihua Yang & Xu Zhang & Xia Wang, 2024. "The Wasserstein Metric between a Discrete Probability Measure and a Continuous One," Mathematics, MDPI, vol. 12(15), pages 1-13, July.
    5. Amit Kumar & Anila Gupta, 2013. "Mehar’s methods for fuzzy assignment problems with restrictions," Fuzzy Information and Engineering, Springer, vol. 5(1), pages 27-44, March.
    6. Nisse, Nicolas & Salch, Alexandre & Weber, Valentin, 2023. "Recovery of disrupted airline operations using k-maximum matching in graphs," European Journal of Operational Research, Elsevier, vol. 309(3), pages 1061-1072.
    7. Parvin Ahmadi & Iman Gholampour & Mahmoud Tabandeh, 2018. "Cluster-based sparse topical coding for topic mining and document clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(3), pages 537-558, September.
    8. Bachtenkirch, David & Bock, Stefan, 2022. "Finding efficient make-to-order production and batch delivery schedules," European Journal of Operational Research, Elsevier, vol. 297(1), pages 133-152.
    9. Samyajoy Pal & Christian Heumann, 2025. "Revisiting Dirichlet Mixture Model: unraveling deeper insights and practical applications," Statistical Papers, Springer, vol. 66(1), pages 1-38, January.
    10. Omar Zatarain & Jesse Yoe Rumbo-Morales & Silvia Ramos-Cabral & Gerardo Ortíz-Torres & Felipe d. J. Sorcia-Vázquez & Iván Guillén-Escamilla & Juan Carlos Mixteco-Sánchez, 2023. "A Method for Perception and Assessment of Semantic Textual Similarities in English," Mathematics, MDPI, vol. 11(12), pages 1-20, June.
    11. Chenchen Ma & Jing Ouyang & Gongjun Xu, 2023. "Learning Latent and Hierarchical Structures in Cognitive Diagnosis Models," Psychometrika, Springer;The Psychometric Society, vol. 88(1), pages 175-207, March.
    12. Winker, Peter, 2023. "Visualizing Topic Uncertainty in Topic Modelling," VfS Annual Conference 2023 (Regensburg): Growth and the "sociale Frage" 277584, Verein für Socialpolitik / German Economic Association.
    13. Robert M. Curry & Joseph Foraker & Gary Lazzaro & David M. Ruth, 2024. "Practice Summary: Optimal Student Group Reassignment at U.S. Naval Academy," Interfaces, INFORMS, vol. 54(3), pages 205-210, May.
    14. Aidin Rezaeian & Hamidreza Koosha & Mohammad Ranjbar & Saeed Poormoaied, 2024. "The assignment of project managers to projects in an uncertain dynamic environment," Annals of Operations Research, Springer, vol. 341(2), pages 1107-1134, October.
    15. Tran Hoang Hai, 2020. "Estimation of volatility causality in structural autoregressions with heteroskedasticity using independent component analysis," Statistical Papers, Springer, vol. 61(1), pages 1-16, February.
    16. Delafield, Gemma & Smith, Greg S. & Day, Brett & Holland, Robert A. & Donnison, Caspar & Hastings, Astley & Taylor, Gail & Owen, Nathan & Lovett, Andrew, 2024. "Spatial context matters: Assessing how future renewable energy pathways will impact nature and society," Renewable Energy, Elsevier, vol. 220(C).
    17. P. Senthil Kumar & R. Jahir Hussain, 2016. "A Simple Method for Solving Fully Intuitionistic Fuzzy Real Life Assignment Problem," International Journal of Operations Research and Information Systems (IJORIS), IGI Global, vol. 7(2), pages 39-61, April.
    18. Caplin, Andrew & Leahy, John, 2020. "Comparative statics in markets for indivisible goods," Journal of Mathematical Economics, Elsevier, vol. 90(C), pages 80-94.
    19. Biró, Péter & Gudmundsson, Jens, 2021. "Complexity of finding Pareto-efficient allocations of highest welfare," European Journal of Operational Research, Elsevier, vol. 291(2), pages 614-628.
    20. Sallam, Gamal & Baroudi, Uthman, 2020. "A two-stage framework for fair autonomous robot deployment using virtual forces," Transportation Research Part A: Policy and Practice, Elsevier, vol. 141(C), pages 35-50.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-56625-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.