IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v13y2025i9p1538-d1650879.html
   My bibliography  Save this article

Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework

Author

Listed:
  • Yusong Ke

    (School of Electronic and Information Engineering, Tongji University, Shanghai 201804, China)

  • Hongru Lin

    (School of Electrical and Information Engineering, Hunan University, Changsha 410082, China)

  • Yuting Ruan

    (School of Economics and Management, Fuzhou University, Fuzhou 350100, China)

  • Junya Tang

    (School of Computer Science and Technology, Tongji University, Shanghai 200070, China)

  • Li Li

    (School of Electronic and Information Engineering, Tongji University, Shanghai 201804, China)

Abstract

Large language models (LLMs) are increasingly adopted in medical question answering (QA) scenarios. However, LLMs have been proven to generate hallucinations and nonfactual information, undermining their trustworthiness in high-stakes medical tasks. Conformal Prediction (CP) is now recognized as a robust framework within the broader domain of machine learning, offering statistically rigorous guarantees of marginal (average) coverage for prediction sets. However, the applicability of CP in medical QA remains to be explored. To address this limitation, this study proposes an enhanced CP framework for medical multiple-choice question answering (MCQA) tasks. The enhanced CP framework associates the non-conformance score with the frequency score of the correct option. The framework generates multiple outputs for the same medical query by leveraging self-consistency theory. The proposed framework calculates the frequency score of each option to address the issue of limited access to the model’s internal information. Furthermore, a risk control framework is incorporated into the enhanced CP framework to manage task-specific metrics through a monotonically decreasing loss function. The enhanced CP framework is evaluated on three popular MCQA datasets using off-the-shelf LLMs. Empirical results demonstrate that the enhanced CP framework achieves user-specified average (or marginal) error rates on the test set. Moreover, the results show that the test set’s average prediction set size (APSS) decreases as the risk level increases. It is concluded that it is a promising evaluation metric for the uncertainty of LLMs.

Suggested Citation

  • Yusong Ke & Hongru Lin & Yuting Ruan & Junya Tang & Li Li, 2025. "Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework," Mathematics, MDPI, vol. 13(9), pages 1-17, May.
  • Handle: RePEc:gam:jmathe:v:13:y:2025:i:9:p:1538-:d:1650879
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/13/9/1538/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/13/9/1538/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:13:y:2025:i:9:p:1538-:d:1650879. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.