Author
Listed:
- Arshiya Mariam
- Hamed Javidi
- Emily C Zabor
- Ran Zhao
- Tomas Radivoyevitch
- Daniel M Rotroff
Abstract
Longitudinal electronic health records (EHR) can be utilized to identify patterns of disease development and progression in real-world settings. Unsupervised temporal matching algorithms are being repurposed to EHR from signal processing- and protein-sequence alignment tasks where they have shown immense promise for gaining insight into disease. The robustness of these algorithms for classifying EHR clinical data remains to be determined. Timeseries compiled from clinical measurements, such as blood pressure, have far more irregularity in sampling and missingness than the data for which these algorithms were developed, necessitating a systematic evaluation of these methods. We applied 30 state-of-the-art unsupervised machine learning algorithms to 6,912 systematically generated simulated clinical datasets across five parameters. These algorithms included eight temporal matching algorithms with fourteen partitional and eight fuzzy clustering methods. Nemenyi tests were used to determine differences in accuracy using the Adjusted Rand Index (ARI). Dynamic time warping and its lower-bound variants had the highest accuracies across all cohorts (median ARI>0.70). All 30 methods were better at discriminating classes with differences in magnitude compared to differences in trajectory shapes. Missingness impacted accuracies only when classes were different by trajectory shape. The method with the highest ARI was then used to cluster a large pediatric metabolic syndrome (MetS) cohort (N = 43,426). We identified three unique childhood BMI patterns with high average cluster consensus (>70%). The algorithm identified a cluster with consistently high BMI which had the greatest risk of MetS, consistent with prior literature (OR = 4.87, 95% CI: 3.93–6.12). While these algorithms have been shown to have similar accuracies for regular timeseries, their accuracies in clinical applications vary substantially in discriminating differences in shape and especially with moderate to high missingness (>10%). This systematic assessment also shows that the most robust algorithms tested here can derive meaningful insights from longitudinal clinical data.Author summary: Clinical data is regularly recorded in patients’ health records by healthcare institutions and is becoming increasingly available for research to identify clinically meaningful subgroups, that can help drive developments in precision medicine. Clustering methods from other domains, such as audio signal processing, are being repurposed for these tasks however, clinical data has its own unique characteristics, such as missing data and specific correlation structures, that may impact the performance of certain clustering methods. Here, using a large, simulated dataset we developed from real patient data, our objective is to establish which approaches are best at stratifying patients using longitudinal clinical data. We identified dynamic time warping (DTW) and its lower-bound variants as highly robust clustering algorithms that showed impressive performance at classifying patients based on variations in trajectory shapes and trajectory magnitudes. We also demonstrate, using a real cohort of >43,000 pediatric patients, that DTW can classify BMI trajectories to identify patients at elevated risk of developing pediatric metabolic syndrome. Our study provides insights in the robustness of algorithms and their use in identifying novel pattens in clinical domain.
Suggested Citation
Arshiya Mariam & Hamed Javidi & Emily C Zabor & Ran Zhao & Tomas Radivoyevitch & Daniel M Rotroff, 2024.
"Unsupervised clustering of longitudinal clinical measurements in electronic health records,"
PLOS Digital Health, Public Library of Science, vol. 3(10), pages 1-20, October.
Handle:
RePEc:plo:pdig00:0000628
DOI: 10.1371/journal.pdig.0000628
Download full text from publisher
Most related items
These are the items that most often cite the same works as this one and are cited by the same works as this one.
- Hawon Chu & Jaeseong Kim & Seounghyeon Kim & Young-Kyoon Suh & Ryong Lee & Rae-Young Jang & Minwoo Park, 2020.
"ST-Trie: A Novel Indexing Scheme for Efficiently Querying Heterogeneous, Spatiotemporal IoT Data,"
Sustainability, MDPI, vol. 12(22), pages 1-21, November.
- Alfonso Marino & Paolo Pariso & Michele Picariello, 2023.
"Energy use and End-use Technologies: Organizational and Energy Analysis in Italian Hospitals,"
International Journal of Energy Economics and Policy, Econjournals, vol. 13(3), pages 36-45, May.
- Guoquan Zhang & Guohao Li & Jing Peng, 2020.
"Risk Assessment and Monitoring of Green Logistics for Fresh Produce Based on a Support Vector Machine,"
Sustainability, MDPI, vol. 12(18), pages 1-20, September.
- Jorge Morato & Sonia Sanchez-Cuadrado & Ana Iglesias & Adrián Campillo & Carmen Fernández-Panadero, 2021.
"Sustainable Technologies for Older Adults,"
Sustainability, MDPI, vol. 13(15), pages 1-35, July.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000628. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.