IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0118432.html
   My bibliography  Save this article

The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets

Author

Listed:
  • Takaya Saito
  • Marc Rehmsmeier

Abstract

Binary classifiers are routinely evaluated with performance measures such as sensitivity and specificity, and performance is frequently illustrated with Receiver Operating Characteristics (ROC) plots. Alternative measures such as positive predictive value (PPV) and the associated Precision/Recall (PRC) plots are used less frequently. Many bioinformatics studies develop and evaluate classifiers that are to be applied to strongly imbalanced datasets in which the number of negatives outweighs the number of positives significantly. While ROC plots are visually appealing and provide an overview of a classifier's performance across a wide range of specificities, one can ask whether ROC plots could be misleading when applied in imbalanced classification scenarios. We show here that the visual interpretability of ROC plots in the context of imbalanced datasets can be deceptive with respect to conclusions about the reliability of classification performance, owing to an intuitive but wrong interpretation of specificity. PRC plots, on the other hand, can provide the viewer with an accurate prediction of future classification performance due to the fact that they evaluate the fraction of true positives among positive predictions. Our findings have potential implications for the interpretation of a large number of studies that use ROC plots on imbalanced datasets.

Suggested Citation

  • Takaya Saito & Marc Rehmsmeier, 2015. "The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets," PLOS ONE, Public Library of Science, vol. 10(3), pages 1-21, March.
  • Handle: RePEc:plo:pone00:0118432
    DOI: 10.1371/journal.pone.0118432
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0118432
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0118432&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0118432?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Asa Ben-Hur & Cheng Soon Ong & Sören Sonnenburg & Bernhard Schölkopf & Gunnar Rätsch, 2008. "Support Vector Machines and Kernels for Computational Biology," PLOS Computational Biology, Public Library of Science, vol. 4(10), pages 1-10, October.
    2. Adi L Tarca & Vincent J Carey & Xue-wen Chen & Roberto Romero & Sorin Drăghici, 2007. "Machine Learning and Its Applications to Biology," PLOS Computational Biology, Public Library of Science, vol. 3(6), pages 1-11, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lior Shamir & John D Delaney & Nikita Orlov & D Mark Eckley & Ilya G Goldberg, 2010. "Pattern Recognition Software and Techniques for Biological Image Analysis," PLOS Computational Biology, Public Library of Science, vol. 6(11), pages 1-10, November.
    2. Stephen Gang Wu & Yuxuan Wang & Wu Jiang & Tolutola Oyetunde & Ruilian Yao & Xuehong Zhang & Kazuyuki Shimizu & Yinjie J Tang & Forrest Sheng Bao, 2016. "Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming," PLOS Computational Biology, Public Library of Science, vol. 12(4), pages 1-22, April.
    3. Früh, Linus & Kampen, Helge & Kerkow, Antje & Schaub, Günter A. & Walther, Doreen & Wieland, Ralf, 2018. "Modelling the potential distribution of an invasive mosquito species: comparative evaluation of four machine learning methods and their combinations," Ecological Modelling, Elsevier, vol. 388(C), pages 136-144.
    4. Alaa Tharwat & Aboul Ella Hassanien, 2019. "Quantum-Behaved Particle Swarm Optimization for Parameter Optimization of Support Vector Machine," Journal of Classification, Springer;The Classification Society, vol. 36(3), pages 576-598, October.
    5. Emily S W Wong & Margaret C Hardy & David Wood & Timothy Bailey & Glenn F King, 2013. "SVM-Based Prediction of Propeptide Cleavage Sites in Spider Toxins Identifies Toxin Innovation in an Australian Tarantula," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-11, July.
    6. Asa Ben-Hur & Cheng Soon Ong & Sören Sonnenburg & Bernhard Schölkopf & Gunnar Rätsch, 2008. "Support Vector Machines and Kernels for Computational Biology," PLOS Computational Biology, Public Library of Science, vol. 4(10), pages 1-10, October.
    7. Wang, Jia & Hu, Jun & Shen, Shifei & Zhuang, Jun & Ni, Shunjiang, 2020. "Crime risk analysis through big data algorithm with urban metrics," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 545(C).
    8. Joana Rosado Coelho & João André Carriço & Daniel Knight & Jose-Luis Martínez & Ian Morrissey & Marco Rinaldo Oggioni & Ana Teresa Freitas, 2013. "The Use of Machine Learning Methodologies to Analyse Antibiotic and Biocide Susceptibility in Staphylococcus aureus," PLOS ONE, Public Library of Science, vol. 8(2), pages 1-10, February.
    9. Shun Adachi, 2017. "Rigid geometry solves “curse of dimensionality” effects in clustering methods: An application to omics data," PLOS ONE, Public Library of Science, vol. 12(6), pages 1-20, June.
    10. Parag Parashar & Chun Han Chen & Chandni Akbar & Sze Ming Fu & Tejender S Rawat & Sparsh Pratik & Rajat Butola & Shih Han Chen & Albert S Lin, 2019. "Analytics-statistics mixed training and its fitness to semisupervised manufacturing," PLOS ONE, Public Library of Science, vol. 14(8), pages 1-18, August.
    11. Kay H Brodersen & Thomas M Schofield & Alexander P Leff & Cheng Soon Ong & Ekaterina I Lomakina & Joachim M Buhmann & Klaas E Stephan, 2011. "Generative Embedding for Model-Based Classification of fMRI Data," PLOS Computational Biology, Public Library of Science, vol. 7(6), pages 1-19, June.
    12. Shweta Bhandare & Debra S Goldberg & Robin Dowell, 2017. "Discriminating between HuR and TTP binding sites using the k-spectrum kernel method," PLOS ONE, Public Library of Science, vol. 12(3), pages 1-14, March.
    13. Ribeiro, Haroldo V. & Lopes, Diego D. & Pessa, Arthur A.B. & Martins, Alvaro F. & da Cunha, Bruno R. & Gonçalves, Sebastián & Lenzi, Ervin K. & Hanley, Quentin S. & Perc, Matjaž, 2023. "Deep learning criminal networks," Chaos, Solitons & Fractals, Elsevier, vol. 172(C).
    14. Wei Shui & Yiyi Zhang & Xinggui Wang & Yuanmeng Liu & Qianfeng Wang & Fei Duan & Chaowei Wu & Wanyu Shui, 2022. "Does Tibetan Household Livelihood Capital Enhance Tourism Participation Sustainability? Evidence from China’s Jiaju Tibetan Village," IJERPH, MDPI, vol. 19(15), pages 1-15, July.
    15. Marina M -C Vidovic & Nico Görnitz & Klaus-Robert Müller & Gunnar Rätsch & Marius Kloft, 2015. "SVM2Motif—Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor," PLOS ONE, Public Library of Science, vol. 10(12), pages 1-23, December.
    16. Emili Balaguer-Ballester & Christopher C Lapish & Jeremy K Seamans & Daniel Durstewitz, 2011. "Attracting Dynamics of Frontal Cortex Ensembles during Memory-Guided Decision-Making," PLOS Computational Biology, Public Library of Science, vol. 7(5), pages 1-19, May.
    17. A Ivanenko & P Watkins & M A J van Gerven & K Hammerschmidt & B Englitz, 2020. "Classifying sex and strain from mouse ultrasonic vocalizations using deep learning," PLOS Computational Biology, Public Library of Science, vol. 16(6), pages 1-27, June.
    18. Dolores Wolfram & Ravi Starzl & Hubert Hackl & Derek Barclay & Theresa Hautz & Bettina Zelger & Gerald Brandacher & W P Andrew Lee & Nadine Eberhart & Yoram Vodovotz & Johann Pratschke & Gerhard Piere, 2014. "Insights from Computational Modeling in Inflammation and Acute Rejection in Limb Transplantation," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-11, June.
    19. Yue Deng & Yanyu Zhao & Yebin Liu & Qionghai Dai, 2013. "Differences Help Recognition: A Probabilistic Interpretation," PLOS ONE, Public Library of Science, vol. 8(6), pages 1-10, June.
    20. Charlotte Soneson & Sarah Gerster & Mauro Delorenzi, 2014. "Batch Effect Confounding Leads to Strong Bias in Performance Estimates Obtained by Cross-Validation," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-13, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0118432. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.