IDEAS home Printed from https://ideas.repec.org/a/igg/jhisi0/v10y2015i1p39-66.html
   My bibliography  Save this article

Mining ICDDR, B Hospital Surveillance Data and Exhibiting Strategies for Balancing Large Unbalanced Datasets

Author

Listed:
  • Adnan Firoze

    (School of Engineering and Applied Science (SEAS), Columbia University, New York City, NY, USA)

  • Rashedur M. Rahman

    (Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh)

Abstract

This research uses a number of classifier models on Hospital Surveillance data to classify admitted patients according to their critical conditions. Three class labels were used to distinguish the criticality of the admitted patients. Furthermore, set forth are two distinct approaches to address the over-fitting problem in the unbalanced dataset since the frequency of instances of the class ‘low' is significantly higher than the other two classes. Apart from trimming the dataset to balance the classes, this work has dealt with the over-fitting problem by introducing the ‘Synthetic Minority Over-sampling Technique' (SMOTE) algorithm coupled with Locally Linear Embedding (LLE). It has constructed three models that applied the neural, and multinomial logistic regression classifications and finally compared the performance of the work's models with the models developed by Rahman and Hasan (2011) where they used several decision tree models to classify the same dataset using tenfold cross validation. Additionally, for a comprehensive comparative analysis, this work has compared the classification performance of the authors' novel third model using support vector machine (SVM). After comparison, the work shows that one of the authors' models surpasses all prior models in terms of classification performance, taking into account the performance time trade-off, giving them an efficient model that handles large scale unbalanced datasets efficiently with standard classification performance. The models developed in this research can become imperative tools to doctors when large numbers of patients arrive in a short interval especially during epidemics. Since, intervention of machines become a necessity when doctors are scarce, computer applications powered by these models are helpful to diagnose and measure the criticality of the newly arrived patients with the help of the historical data kept in the surveillance database.

Suggested Citation

  • Adnan Firoze & Rashedur M. Rahman, 2015. "Mining ICDDR, B Hospital Surveillance Data and Exhibiting Strategies for Balancing Large Unbalanced Datasets," International Journal of Healthcare Information Systems and Informatics (IJHISI), IGI Global, vol. 10(1), pages 39-66, January.
  • Handle: RePEc:igg:jhisi0:v:10:y:2015:i:1:p:39-66
    as

    Download full text from publisher

    File URL: http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/IJHISI.2015010103
    Download Restriction: no
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:igg:jhisi0:v:10:y:2015:i:1:p:39-66. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Journal Editor (email available below). General contact details of provider: https://www.igi-global.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.