IDEAS home Printed from https://ideas.repec.org/a/spr/trosos/v16y2022i1d10.1007_s12626-021-00100-w.html
   My bibliography  Save this article

Expectation–Maximization (EM) Clustering as a Preprocessing Method for Clinical Pathway Mining

Author

Listed:
  • Shusaku Tsumoto

    (Shimane University)

  • Tomohiro Kimura

    (Shimane University)

  • Shoji Hirano

    (Shimane University)

Abstract

Hospital information systems (HIS) are service-oriented systems that focus on payment for medical services. Because all HIS coding for diseases and clinical processes are payment-oriented, they may differ from clinicians’ concepts of diseases and processes. HIS in large-scale hospitals in Japan utilize Diagnostic Procedure Combination (DPC) codes, a disease-coding system that focuses on the use of medical resources. Although DPC codes are very precise for diseases requiring surgery, such as cataracts and lung cancer, classification codes for diseases that do not require surgery, such as cerebral infarction, are less precise, with a single category often covering many subtypes with different clinical courses. This paper proposes a preprocessing method that splits DPC codes into subgroups prior to the application of dual clustering-based clinical pathway mining. This method applies expectation–maximization (EM) clustering to the length of patient stay in the hospital using Akaike Information Criteria (AIC) to select the number of clusters. A dual mining method is subsequently applied to the datasets of subgroups and the meanings of subtype clusters are explored using a text mining method. The proposed method was evaluated using datasets from an HIS at Shimane University hospital as preprocessing for clinical pathway mining. The experimental results showed that the proposed method correctly generated subgroups from the more generalized DPC codes and that the clinical pathways identified after this preprocessing capture the characteristics of processes in real clinical settings.

Suggested Citation

  • Shusaku Tsumoto & Tomohiro Kimura & Shoji Hirano, 2022. "Expectation–Maximization (EM) Clustering as a Preprocessing Method for Clinical Pathway Mining," The Review of Socionetwork Strategies, Springer, vol. 16(1), pages 25-52, April.
  • Handle: RePEc:spr:trosos:v:16:y:2022:i:1:d:10.1007_s12626-021-00100-w
    DOI: 10.1007/s12626-021-00100-w
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s12626-021-00100-w
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s12626-021-00100-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Shusaku Tsumoto & Tomohirno Kimura & Shoji Hirano, 2021. "Determination of Disease from Discharge Summaries," The Review of Socionetwork Strategies, Springer, vol. 15(1), pages 49-66, June.
    2. Shusaku Tsumoto & Tomohiro Kimura & Shoji Hirano, 2021. "Mining Clinical Pathways Using Dual Clustering," The Review of Socionetwork Strategies, Springer, vol. 15(2), pages 287-307, November.
    3. Benaglia, Tatiana & Chauveau, Didier & Hunter, David R. & Young, Derek S., 2009. "mixtools: An R Package for Analyzing Mixture Models," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 32(i06).
    4. Kim, Ji-Hyun, 2009. "Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap," Computational Statistics & Data Analysis, Elsevier, vol. 53(11), pages 3735-3745, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ana Pinto & Tong Yin & Marion Reichenbach & Raghavendra Bhatta & Pradeep Kumar Malik & Eva Schlecht & Sven König, 2020. "Enteric Methane Emissions of Dairy Cattle Considering Breed Composition, Pasture Management, Housing Conditions and Feeding Characteristics along a Rural-Urban Gradient in a Rising Megacity," Agriculture, MDPI, vol. 10(12), pages 1-18, December.
    2. Ozonder, Gozde & Miller, Eric J., 2021. "Longitudinal investigation of skeletal activity episode timing decisions – A copula approach," Journal of choice modelling, Elsevier, vol. 40(C).
    3. Mark G E White & Neil E Bezodis & Jonathon Neville & Huw Summers & Paul Rees, 2022. "Determining jumping performance from a single body-worn accelerometer using machine learning," PLOS ONE, Public Library of Science, vol. 17(2), pages 1-25, February.
    4. Minjung Kyung & Ju-Hyun Park & Ji Yeh Choi, 2022. "Bayesian Mixture Model of Extended Redundancy Analysis," Psychometrika, Springer;The Psychometric Society, vol. 87(3), pages 946-966, September.
    5. Airola, Antti & Pahikkala, Tapio & Waegeman, Willem & De Baets, Bernard & Salakoski, Tapio, 2011. "An experimental comparison of cross-validation techniques for estimating the area under the ROC curve," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1828-1844, April.
    6. Xue, Jiacheng & Yao, Weixin, 2022. "Machine Learning Embedded Semiparametric Mixtures of Regressions with Covariate-Varying Mixing Proportions," Econometrics and Statistics, Elsevier, vol. 22(C), pages 159-171.
    7. Meng Li & Sijia Xiang & Weixin Yao, 2016. "Robust estimation of the number of components for mixtures of linear regression models," Computational Statistics, Springer, vol. 31(4), pages 1539-1555, December.
    8. Matthias Schmid & Thomas Hielscher & Thomas Augustin & Olaf Gefeller, 2011. "A Robust Alternative to the Schemper–Henderson Estimator of Prediction Error," Biometrics, The International Biometric Society, vol. 67(2), pages 524-535, June.
    9. Luts, Jan & Ormerod, John T., 2014. "Mean field variational Bayesian inference for support vector machine classification," Computational Statistics & Data Analysis, Elsevier, vol. 73(C), pages 163-176.
    10. David Rios Insua & Roi Naveiro & Victor Gallego, 2020. "Perspectives on Adversarial Classification," Mathematics, MDPI, vol. 8(11), pages 1-21, November.
    11. Derek S. Young & Xi Chen & Dilrukshi C. Hewage & Ricardo Nilo-Poyanco, 2019. "Finite mixture-of-gamma distributions: estimation, inference, and model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 1053-1082, December.
    12. John J Nay & Yevgeniy Vorobeychik, 2016. "Predicting Human Cooperation," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-19, May.
    13. Héctor Nájera & David Gordon, 2023. "A Monte Carlo Study of Some Empirical Methods to Find the Optimal Poverty Line in Multidimensional Poverty Measurement," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 167(1), pages 391-419, June.
    14. Doering, Kenji & Sendelbach, Luke & Steinschneider, Scott & Lindsay Anderson, C., 2021. "The effects of wind generation and other market determinants on price spikes," Applied Energy, Elsevier, vol. 300(C).
    15. Zihao Wang & Wenxi Wang & Xiaoming Xie & Yongfa Wang & Zhengzhao Yang & Huiru Peng & Mingming Xin & Yingyin Yao & Zhaorong Hu & Jie Liu & Zhenqi Su & Chaojie Xie & Baoyun Li & Zhongfu Ni & Qixin Sun &, 2022. "Dispersed emergence and protracted domestication of polyploid wheat uncovered by mosaic ancestral haploblock inference," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    16. Dalla Valle, Luciana & De Giuli, Maria Elena & Tarantola, Claudia & Manelli, Claudio, 2016. "Default probability estimation via pair copula constructions," European Journal of Operational Research, Elsevier, vol. 249(1), pages 298-311.
    17. Shusaku Tsumoto & Tomohiro Kimura & Shoji Hirano, 2021. "Mining Clinical Pathways Using Dual Clustering," The Review of Socionetwork Strategies, Springer, vol. 15(2), pages 287-307, November.
    18. Zhang, Liyuan & Zhang, Huihui & Han, Wenting & Niu, Yaxiao & Chávez, José L. & Ma, Weitong, 2021. "The mean value of gaussian distribution of excess green index: A new crop water stress indicator," Agricultural Water Management, Elsevier, vol. 251(C).
    19. Ivana Malá, 2013. "Použití konečných směsí logaritmicko-normálních rozdělení pro modelování příjmů českých domácností [The Use of Finite Mixtures of Lognormal Distribution for the Modelling of Household Income Distri," Politická ekonomie, Prague University of Economics and Business, vol. 2013(3), pages 356-372.
    20. Matthew Tuson & Berwin Turlach & Kevin Murray & Mei Ruu Kok & Alistair Vickery & David Whyatt, 2021. "Predicting Future Geographic Hotspots of Potentially Preventable Hospitalisations Using All Subset Model Selection and Repeated K-Fold Cross-Validation," IJERPH, MDPI, vol. 18(19), pages 1-21, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:trosos:v:16:y:2022:i:1:d:10.1007_s12626-021-00100-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.