Author
Listed:
- Biniyam Gebeyehu
- Bennett Kleinberg
- Katrijn Van Deun
- Esther de Vries
Abstract
Background: Routine healthcare data are increasingly stored in electronic health records (EHRs), presenting an exciting opportunity to leverage machine learning (ML) for detecting and predicting medical events. While medical experts are optimistic about expanding its applications, several caveats exist which are often overlooked. Many medical outcomes are categorical (e.g., a diagnosis is present or absent) with categories being considerably unequal in size, which might significantly impact the performance of ML algorithms. Detecting small subgroups in EHR data, so-called anomaly detection, is an emerging approach, yet organized documentation on current practices remains scarce. This scoping review examines medical anomaly detection based on routine healthcare data stored in EHRs and formulated alternative approaches in case suboptimal practices were noticed. Methods: PubMed and Web of Science were searched up to September 5, 2024. Peer-reviewed articles and conference papers on ML-based medical anomaly detection in EHR data were included. Fifty-two study characteristics were extracted and analyzed both quantitatively and qualitatively. Results: A total of 117 studies met the inclusion criteria. The cross-study median proportion of the anomalous class was 0.079 (range 0.00045–0.23). Key details, e.g., data preprocessing actions, were often incomplete; 14.5% (n = 17) provided no information on this aspect. Only four studies reported the underlying cause of missingness before deciding how to handle it, and just three considered the clinical implications of false positives and false negatives when evaluating anomaly detection performance. Conclusion: We identified a need for greater attention in the current medical anomaly detection literature for reporting details on pre-processing, handling of missing data, and the use of performance metrics. With the increasing number of anomaly detection studies based on routine healthcare data stored in EHRs, more focus is needed on implementation and reporting practices to ensure relevance and reproducibility of future studies in this field.
Suggested Citation
Biniyam Gebeyehu & Bennett Kleinberg & Katrijn Van Deun & Esther de Vries, 2026.
"Detection of rare medical events in electronic health records using machine learning: Current practices and suggestions – A scoping review,"
PLOS ONE, Public Library of Science, vol. 21(3), pages 1-19, March.
Handle:
RePEc:plo:pone00:0332963
DOI: 10.1371/journal.pone.0332963
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0332963. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.