IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v17y2025i6p239-d1667343.html
   My bibliography  Save this article

Multi Stage Retrieval for Web Search During Crisis

Author

Listed:
  • Claudiu Constantin Tcaciuc

    (Dipartimento di Automatica e Informatica, Politecnico di Torino, 10129 Torino, Italy)

  • Daniele Rege Cambrin

    (Dipartimento di Automatica e Informatica, Politecnico di Torino, 10129 Torino, Italy)

  • Paolo Garza

    (Dipartimento di Automatica e Informatica, Politecnico di Torino, 10129 Torino, Italy)

Abstract

During crisis events, digital information volume can increase by over 500% within hours, with social media platforms alone generating millions of crisis-related posts. This volume creates critical challenges for emergency responders who require timely access to the concise subset of accurate information they are interested in. Existing approaches strongly rely on the power of large language models. However, the use of large language models limits the scalability of the retrieval procedure and may introduce hallucinations. This paper introduces a novel multi-stage text retrieval framework to enhance information retrieval during crises. Our framework employs a novel three-stage extractive pipeline where (1) a topic modeling component filters candidates based on thematic relevance, (2) an initial high-recall lexical retriever identifies a broad candidate set, and (3) a dense retriever reranks the remaining documents. This architecture balances computational efficiency with retrieval effectiveness, prioritizing high recall in early stages while refining precision in later stages. The framework avoids the introduction of hallucinations, achieving a 15% improvement in BERT-Score compared to existing solutions without requiring any costly abstractive model. Moreover, our sequential approach accelerates the search process by 5% compared to the use of a single-stage based on a dense retrieval approach, with minimal effect on the performance in terms of BERT-Score.

Suggested Citation

  • Claudiu Constantin Tcaciuc & Daniele Rege Cambrin & Paolo Garza, 2025. "Multi Stage Retrieval for Web Search During Crisis," Future Internet, MDPI, vol. 17(6), pages 1-17, May.
  • Handle: RePEc:gam:jftint:v:17:y:2025:i:6:p:239-:d:1667343
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/17/6/239/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/17/6/239/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:17:y:2025:i:6:p:239-:d:1667343. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.