IDEAS home Printed from https://ideas.repec.org/a/spr/trosos/v15y2021i1d10.1007_s12626-021-00076-7.html
   My bibliography  Save this article

Determination of Disease from Discharge Summaries

Author

Listed:
  • Shusaku Tsumoto

    (Shimane University)

  • Tomohirno Kimura

    (Shimane University)

  • Shoji Hirano

    (Shimane University)

Abstract

Determining whether correct disease codes are included in discharge summaries is important for hospital management because submission of medical receipts with incorrect disease codes can result in loss of insurance reimbursement. Because medical information managers in large hospitals must evaluate more than 1000 summaries per month, an automated determination of discharge summaries will reduce their workload, allowing information managers to focus on complicated cases. This paper proposes a method of constructing classifiers of discharge summaries. In the first step, morphological analysis generated a term matrix from text data extracted from the hospital information system. Subsequently, important keywords were selected from an analysis of correspondence, training examples were generated, and machine learning methods were applied to the training examples. Several machine learning methods were compared using discharge summaries stored in the information system of Shimane University Hospital. A random forest method was found to be the best classifier when compared with deep learning, SVM and decision tree methods. Furthermore, the random forest method had a classification accuracy greater than 90%.

Suggested Citation

  • Shusaku Tsumoto & Tomohirno Kimura & Shoji Hirano, 2021. "Determination of Disease from Discharge Summaries," The Review of Socionetwork Strategies, Springer, vol. 15(1), pages 49-66, June.
  • Handle: RePEc:spr:trosos:v:15:y:2021:i:1:d:10.1007_s12626-021-00076-7
    DOI: 10.1007/s12626-021-00076-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s12626-021-00076-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s12626-021-00076-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Karatzoglou, Alexandros & Smola, Alexandros & Hornik, Kurt & Zeileis, Achim, 2004. "kernlab - An S4 Package for Kernel Methods in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 11(i09).
    2. Kim, Ji-Hyun, 2009. "Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap," Computational Statistics & Data Analysis, Elsevier, vol. 53(11), pages 3735-3745, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shusaku Tsumoto & Tomohiro Kimura & Shoji Hirano, 2022. "Expectation–Maximization (EM) Clustering as a Preprocessing Method for Clinical Pathway Mining," The Review of Socionetwork Strategies, Springer, vol. 16(1), pages 25-52, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tsukioka, Yasutomo & Yanagi, Junya & Takada, Teruko, 2018. "Investor sentiment extracted from internet stock message boards and IPO puzzles," International Review of Economics & Finance, Elsevier, vol. 56(C), pages 205-217.
    2. Daniel J. Luckett & Eric B. Laber & Samer S. El‐Kamary & Cheng Fan & Ravi Jhaveri & Charles M. Perou & Fatma M. Shebl & Michael R. Kosorok, 2021. "Receiver operating characteristic curves and confidence bands for support vector machines," Biometrics, The International Biometric Society, vol. 77(4), pages 1422-1430, December.
    3. Grabisch, Michel & Kojadinovic, Ivan & Meyer, Patrick, 2008. "A review of methods for capacity identification in Choquet integral based multi-attribute utility theory: Applications of the Kappalab R package," European Journal of Operational Research, Elsevier, vol. 186(2), pages 766-785, April.
    4. Bellotti, Anthony & Brigo, Damiano & Gambetti, Paolo & Vrins, Frédéric, 2021. "Forecasting recovery rates on non-performing loans with machine learning," International Journal of Forecasting, Elsevier, vol. 37(1), pages 428-444.
    5. Riza, Lala Septem & Bergmeir, Christoph & Herrera, Francisco & Benítez, José M., 2015. "frbs: Fuzzy Rule-Based Systems for Classification and Regression in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 65(i06).
    6. Karin Wolffhechel & Amanda C Hahn & Hanne Jarmer & Claire I Fisher & Benedict C Jones & Lisa M DeBruine, 2015. "Testing the Utility of a Data-Driven Approach for Assessing BMI from Face Images," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-10, October.
    7. Mark G E White & Neil E Bezodis & Jonathon Neville & Huw Summers & Paul Rees, 2022. "Determining jumping performance from a single body-worn accelerometer using machine learning," PLOS ONE, Public Library of Science, vol. 17(2), pages 1-25, February.
    8. Airola, Antti & Pahikkala, Tapio & Waegeman, Willem & De Baets, Bernard & Salakoski, Tapio, 2011. "An experimental comparison of cross-validation techniques for estimating the area under the ROC curve," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1828-1844, April.
    9. Andrea S Martinez-Vernon & James A Covington & Ramesh P Arasaradnam & Siavash Esfahani & Nicola O’Connell & Ioannis Kyrou & Richard S Savage, 2018. "An improved machine learning pipeline for urinary volatiles disease detection: Diagnosing diabetes," PLOS ONE, Public Library of Science, vol. 13(9), pages 1-20, September.
    10. Khamma, Thulasi Ram & Zhang, Yuming & Guerrier, Stéphane & Boubekri, Mohamed, 2020. "Generalized additive models: An efficient method for short-term energy prediction in office buildings," Energy, Elsevier, vol. 213(C).
    11. Madhumita Sahoo & Aman Kasot & Anirban Dhar & Amlanjyoti Kar, 2018. "On Predictability of Groundwater Level in Shallow Wells Using Satellite Observations," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 32(4), pages 1225-1244, March.
    12. P. J. Zarco-Tejada & T. Poblete & C. Camino & V. Gonzalez-Dugo & R. Calderon & A. Hornero & R. Hernandez-Clemente & M. Román-Écija & M. P. Velasco-Amo & B. B. Landa & P. S. A. Beck & M. Saponari & D. , 2021. "Divergent abiotic spectral pathways unravel pathogen stress signals across species," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    13. Grubinger, Thomas & Zeileis, Achim & Pfeiffer, Karl-Peter, 2014. "evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 61(i01).
    14. Matthias Schmid & Thomas Hielscher & Thomas Augustin & Olaf Gefeller, 2011. "A Robust Alternative to the Schemper–Henderson Estimator of Prediction Error," Biometrics, The International Biometric Society, vol. 67(2), pages 524-535, June.
    15. Uwe Ligges & Sebastian Krey, 2011. "Feature clustering for instrument classification," Computational Statistics, Springer, vol. 26(2), pages 279-291, June.
    16. Arnout Van Messem & Andreas Christmann, 2010. "A review on consistency and robustness properties of support vector machines for heavy-tailed distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 4(2), pages 199-220, September.
    17. Luts, Jan & Ormerod, John T., 2014. "Mean field variational Bayesian inference for support vector machine classification," Computational Statistics & Data Analysis, Elsevier, vol. 73(C), pages 163-176.
    18. David Rios Insua & Roi Naveiro & Victor Gallego, 2020. "Perspectives on Adversarial Classification," Mathematics, MDPI, vol. 8(11), pages 1-21, November.
    19. Nunes, Matthew, 2015. "Statistical Analysis of Network Data with R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 66(b01).
    20. Ana Patrícia Rocha & Hugo Miguel Pereira Choupina & Maria do Carmo Vilas-Boas & José Maria Fernandes & João Paulo Silva Cunha, 2018. "System for automatic gait analysis based on a single RGB-D camera," PLOS ONE, Public Library of Science, vol. 13(8), pages 1-24, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:trosos:v:15:y:2021:i:1:d:10.1007_s12626-021-00076-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.