IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0155119.html
   My bibliography  Save this article

Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

Author

Listed:
  • Talayeh Razzaghi
  • Oleg Roderick
  • Ilya Safro
  • Nicholas Marko

Abstract

This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.

Suggested Citation

  • Talayeh Razzaghi & Oleg Roderick & Ilya Safro & Nicholas Marko, 2016. "Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-18, May.
  • Handle: RePEc:plo:pone00:0155119
    DOI: 10.1371/journal.pone.0155119
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0155119
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0155119&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0155119?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Huang, Chien-Ming & Lee, Yuh-Jye & Lin, Dennis K.J. & Huang, Su-Yun, 2007. "Model selection for support vector machines via uniform design," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 335-346, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jiaxin Li & Zijun Zhou & Jianyu Dong & Ying Fu & Yuan Li & Ze Luan & Xin Peng, 2021. "Predicting breast cancer 5-year survival using machine learning: A systematic review," PLOS ONE, Public Library of Science, vol. 16(4), pages 1-23, April.
    2. Talayeh Razzaghi & Ilya Safro & Joseph Ewing & Ehsan Sadrfaridpour & John D. Scott, 2019. "Predictive models for bariatric surgery risks with imbalanced medical datasets," Annals of Operations Research, Springer, vol. 280(1), pages 1-18, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wolfgang Härdle & Yuh-Jye Lee & Dorothea Schäfer & Yi-Ren Yeh, 2009. "Variable selection and oversampling in the use of smooth support vector machines for predicting the default risk of companies," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 28(6), pages 512-534.
    2. Wolfgang Härdle & Yuh-Jye Lee & Dorothea Schäfer & Yi-Ren Yeh, 2007. "The Default Risk of Firms Examined with Smooth Support Vector Machines," Discussion Papers of DIW Berlin 757, DIW Berlin, German Institute for Economic Research.
    3. Yang, YouLong & Che, JinXing & Li, YanYing & Zhao, YanJun & Zhu, SuLing, 2016. "An incremental electric load forecasting model based on support vector regression," Energy, Elsevier, vol. 113(C), pages 796-808.
    4. Orestis P. Panagopoulos & Petros Xanthopoulos & Talayeh Razzaghi & Onur Şeref, 2019. "Relaxed support vector regression," Annals of Operations Research, Springer, vol. 276(1), pages 191-210, May.
    5. De Brabanter, K. & De Brabanter, J. & Suykens, J.A.K. & De Moor, B., 2010. "Optimized fixed-size kernel models for large data sets," Computational Statistics & Data Analysis, Elsevier, vol. 54(6), pages 1484-1504, June.
    6. Onur Şeref & Talayeh Razzaghi & Petros Xanthopoulos, 2017. "Weighted relaxed support vector machines," Annals of Operations Research, Springer, vol. 249(1), pages 235-271, February.
    7. Chuang, S.C. & Hung, Y.C., 2010. "Uniform design over general input domains with applications to target region estimation in computer experiments," Computational Statistics & Data Analysis, Elsevier, vol. 54(1), pages 219-232, January.
    8. Feng, Zhong-kai & Niu, Wen-jing & Cheng, Chun-tian & Wu, Xin-yu, 2017. "Optimization of hydropower system operation by uniform dynamic programming for dimensionality reduction," Energy, Elsevier, vol. 134(C), pages 718-730.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0155119. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.