IDEAS home Printed from https://ideas.repec.org/a/spr/stabio/v11y2019i1d10.1007_s12561-019-09231-9.html
   My bibliography  Save this article

Binormal Precision–Recall Curves for Optimal Classification of Imbalanced Data

Author

Listed:
  • Zhongkai Liu

    (North Carolina State University)

  • Howard D. Bondell

    (University of Melbourne)

Abstract

Binary classification on imbalanced data, i.e., a large skew in the class distribution, is a challenging problem. Evaluation of classifiers via the receiver operating characteristic (ROC) curve is common in binary classification. Techniques to develop classifiers that optimize the area under the ROC curve have been proposed. However, for imbalanced data, the ROC curve tends to give an overly optimistic view. Realizing its disadvantages of dealing with imbalanced data, we propose an approach based on the Precision–Recall (PR) curve under the binormal assumption. We propose to choose the classifier that maximizes the area under the binormal PR curve. The asymptotic distribution of the resulting estimator is shown. Simulations, as well as real data results, indicate that the binormal Precision–Recall method outperforms approaches based on the area under the ROC curve.

Suggested Citation

  • Zhongkai Liu & Howard D. Bondell, 2019. "Binormal Precision–Recall Curves for Optimal Classification of Imbalanced Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 11(1), pages 141-161, April.
  • Handle: RePEc:spr:stabio:v:11:y:2019:i:1:d:10.1007_s12561-019-09231-9
    DOI: 10.1007/s12561-019-09231-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s12561-019-09231-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s12561-019-09231-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Donald Dorfman & Edward Alf, 1968. "Maximum likelihood estimation of parameters of signal detection theory—A direct solution," Psychometrika, Springer;The Psychometric Society, vol. 33(1), pages 117-124, March.
    2. Margaret Sullivan Pepe & Tianxi Cai & Gary Longton, 2006. "Combining Predictors for Classification Using the Area under the Receiver Operating Characteristic Curve," Biometrics, The International Biometric Society, vol. 62(1), pages 221-229, March.
    3. Kelly Zou & W. J. Hall, 2000. "Two transformation models for estimating an ROC curve derived from continuous data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 27(5), pages 621-631.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yifan Zhong & Chuang Cai & Tao Chen & Hao Gui & Jiajun Deng & Minglei Yang & Bentong Yu & Yongxiang Song & Tingting Wang & Xiwen Sun & Jingyun Shi & Yangchun Chen & Dong Xie & Chang Chen & Yunlang She, 2023. "PET/CT based cross-modal deep learning signature to predict occult nodal metastasis in lung cancer," Nature Communications, Nature, vol. 14(1), pages 1-14, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cheam, Amay S.M. & McNicholas, Paul D., 2016. "Modelling receiver operating characteristic curves using Gaussian mixtures," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 192-208.
    2. Y. Huang & M. S. Pepe, 2009. "A Parametric ROC Model-Based Approach for Evaluating the Predictiveness of Continuous Markers in Case–Control Studies," Biometrics, The International Biometric Society, vol. 65(4), pages 1133-1144, December.
    3. Xin Huang & Gengsheng Qin & Yixin Fang, 2011. "Optimal Combinations of Diagnostic Tests Based on AUC," Biometrics, The International Biometric Society, vol. 67(2), pages 568-576, June.
    4. Wang, Dan & Tian, Lili, 2017. "Parametric methods for confidence interval estimation of overlap coefficients," Computational Statistics & Data Analysis, Elsevier, vol. 106(C), pages 12-26.
    5. Kajal Lahiri & Liu Yang, 2023. "Predicting binary outcomes based on the pair-copula construction," Empirical Economics, Springer, vol. 64(6), pages 3089-3119, June.
    6. Yuanjia Wang & Huaihou Chen & Runze Li & Naihua Duan & Roberto Lewis-Fernández, 2011. "Prediction-Based Structured Variable Selection through the Receiver Operating Characteristic Curves," Biometrics, The International Biometric Society, vol. 67(3), pages 896-905, September.
    7. Chen, Xiwei & Vexler, Albert & Markatou, Marianthi, 2015. "Empirical likelihood ratio confidence interval estimation of best linear combinations of biomarkers," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 186-198.
    8. Sonia Pérez-Fernández & Pablo Martínez-Camblor & Peter Filzmoser & Norberto Corral, 2021. "Visualizing the decision rules behind the ROC curves: understanding the classification process," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 105(1), pages 135-161, March.
    9. Osamu Komori, 2011. "A boosting method for maximization of the area under the ROC curve," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 63(5), pages 961-979, October.
    10. Juana-María Vivo & Manuel Franco & Donatella Vicari, 2018. "Rethinking an ROC partial area index for evaluating the classification performance at a high specificity range," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(3), pages 683-704, September.
    11. Rocío Aznar-Gimeno & Luis M. Esteban & Rafael del-Hoyo-Alonso & Ángel Borque-Fernando & Gerardo Sanz, 2022. "A Stepwise Algorithm for Linearly Combining Biomarkers under Youden Index Maximization," Mathematics, MDPI, vol. 10(8), pages 1-26, April.
    12. Kelly Zou & W. J. Hall, 2002. "Semiparametric and parametric transformation models for comparing diagnostic markers with paired design," Journal of Applied Statistics, Taylor & Francis Journals, vol. 29(6), pages 803-816.
    13. Zhang, Biao, 2006. "A semiparametric hypothesis testing procedure for the ROC curve area under a density ratio model," Computational Statistics & Data Analysis, Elsevier, vol. 50(7), pages 1855-1876, April.
    14. Graf Alexandra C. & Bauer Peter, 2009. "Model Selection Based on FDR-Thresholding Optimizing the Area under the ROC-Curve," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-20, June.
    15. Lloyd, Chris J. & Yong, Zhou, 1999. "Kernel estimators of the ROC curve are better than empirical," Statistics & Probability Letters, Elsevier, vol. 44(3), pages 221-228, September.
    16. Alicja Jokiel-Rokita & Rafał Topolnicki, 2019. "Minimum distance estimation of the binormal ROC curve," Statistical Papers, Springer, vol. 60(6), pages 2161-2183, December.
    17. Qing Lu & Nancy Obuchowski & Sungho Won & Xiaofeng Zhu & Robert C. Elston, 2010. "Using the Optimal Robust Receiver Operating Characteristic (ROC) Curve for Predictive Genetic Tests," Biometrics, The International Biometric Society, vol. 66(2), pages 586-593, June.
    18. Choi, Sungwoo & Park, Junyong, 2014. "Nonparametric additive model with grouped lasso and maximizing area under the ROC curve," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 313-325.
    19. Weining Shen & Jing Ning & Ying Yuan & Anna S. Lok & Ziding Feng, 2018. "Model†free scoring system for risk prediction with application to hepatocellular carcinoma study," Biometrics, The International Biometric Society, vol. 74(1), pages 239-248, March.
    20. Yuxin Zhu & Mei‐Cheng Wang, 2022. "Obtaining optimal cutoff values for tree classifiers using multiple biomarkers," Biometrics, The International Biometric Society, vol. 78(1), pages 128-140, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stabio:v:11:y:2019:i:1:d:10.1007_s12561-019-09231-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.