Author
Listed:
- Wang, Minghui
- Xiao, Pengyuan
- Yu, Mingzhuo
Abstract
Retrieval-augmented generation (RAG) has emerged as a dominant paradigm for grounding large language models in external knowledge, yet the choice of retrieval strategy remains underexplored in the biomedical domain. This study presents an empirical comparison of four retrieval strategies---BM25 (sparse), Contriever (general-purpose dense), MedCPT (domain-specific dense), and a reciprocal rank fusion hybrid combining BM25 with MedCPT---within a standardized RAG pipeline for biomedical question answering. Experiments are conducted on three established benchmarks: PubMedQA, MedQA, and BioASQ Task B. Evaluation spans retrieval quality (Recall@10, Recall@20, MRR@10), end-to-end QA accuracy, and answer faithfulness measured through the RAGAS metric. Results indicate that the hybrid strategy achieves the highest Recall@10 across all three datasets, reaching 0.761 on PubMedQA, 0.697 on MedQA, and 0.768 on BioASQ. The domain-specific MedCPT retriever consistently outperforms the general-purpose Contriever, while BM25 remains a competitive baseline that surpasses Contriever on two of three benchmarks. End-to-end QA accuracy follows a similar pattern, with the hybrid strategy yielding the best performance at 0.741 on PubMedQA and 0.613 on MedQA. Faithfulness analysis reveals that domain-specific retrieval reduces hallucination rates by providing more topically relevant context. These findings offer practical guidance for practitioners selecting retrieval strategies when deploying biomedical RAG applications.
Suggested Citation
Wang, Minghui & Xiao, Pengyuan & Yu, Mingzhuo, 2026.
"Sparse, Dense, or Hybrid? Comparing Retrieval Strategies for Biomedical Question Answering with Retrieval-Augmented Generation,"
Journal of Science, Innovation & Social Impact, Pinnacle Academic Press, vol. 2(2), pages 141-152.
Handle:
RePEc:dba:jsisia:v:2:y:2026:i:2:p:141-152
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:dba:jsisia:v:2:y:2026:i:2:p:141-152. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Joseph Clark (email available below). General contact details of provider: https://pinnaclepubs.com/index.php/JSISI .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.