IDEAS home Printed from https://ideas.repec.org/a/bla/jamist/v59y2008i12p2008-2023.html
   My bibliography  Save this article

Probabilistic passage models for semantic search of genomics literature

Author

Listed:
  • Jay Urbain
  • Nazli Goharian
  • Ophir Frieder

Abstract

We explore unsupervised learning techniques for extracting semantic information about biomedical concepts and topics, and introduce a passage retrieval model for using these semantics in context to improve genomics literature search. Our contributions include a new passage retrieval model based on an undirected graphical model (Markov Random Fields), and new methods for modeling passage‐concepts, document‐topics, and passage‐terms as potential functions within the model. Each potential function includes distributional evidence to disambiguate topics, concepts, and terms in context. The joint distribution across potential functions in the graph represents the probability of a passage being relevant to a biologist's information need. Relevance ranking within each potential function simplifies normalization across potential functions and eliminates the need for tuning of passage retrieval model parameters. Our dimensional indexing model facilitates efficient aggregation of topic, concept, and term distributions. The proposed passage‐retrieval model improves search results in the presence of varying levels of semantic evidence, outperforming models of query terms, concepts, or document topics alone. Our results exceed the state‐of‐the‐art for automatic document retrieval by 14.46% (0.3554 vs. 0.3105) and passage retrieval by 15.57% (0.1128 vs. 0.0976) as assessed by the TREC 2007 Genomics Track, and automatic document retrieval by 18.56% (0.3424 vs. 0.2888) as assessed by the TREC 2005 Genomics Track. Automatic document retrieval results for TREC 2007 and TREC 2005 are statistically significant at the 95% confidence level (p = .0359 and .0253, respectively). Passage retrieval is significant at the 90% confidence level (p = 0.0893).

Suggested Citation

  • Jay Urbain & Nazli Goharian & Ophir Frieder, 2008. "Probabilistic passage models for semantic search of genomics literature," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 59(12), pages 2008-2023, October.
  • Handle: RePEc:bla:jamist:v:59:y:2008:i:12:p:2008-2023
    DOI: 10.1002/asi.20896
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/asi.20896
    Download Restriction: no

    File URL: https://libkey.io/10.1002/asi.20896?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jamist:v:59:y:2008:i:12:p:2008-2023. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.