IDEAS home Printed from https://ideas.repec.org/a/bla/jamest/v51y2000i12p1131-1136.html
   My bibliography  Save this article

Protein annotators' assistant: A novel application of information retrieval techniques

Author

Listed:
  • Michael J. Wise

Abstract

The Protein Annotators' Assistant (or PAA) (http://www.ebi.ac.uk/paa/) is a software system which assists protein annotators in the task of assigning functions to newly sequenced proteins. Working backward from SwissProt, a database which describes known proteins, and a prior sequence similarity search that returns a list of known proteins similar to a query, PAA suggests keywords and phrases which may describe functions performed by the query. In a preprocessing step, a database is built from the protein names that appear in the SwissProt database, and against each protein are listed key words and phrases that are extracted from the corresponding text records. Common words either in general English usage or from the biological domain are removed as the phrases are assembled. This process is assisted by the use of a simple stemming algorithm, which extends the list of stop‐words (i.e., reject words), together with a list of accept‐words. At runtime, the search algorithm, invoked by a user via a Web interface, takes a list of protein names and clusters the named proteins around keywords/phrases shared by members of the list. The assumption is that if these proteins have a particular keyword/phrase in common, and they are related to a query protein, then the keyword/phrase may also describe the query. Overall, PAA employs a number of IR techniques in a novel setting and is thus related to text categorization, where multiple categories may be suggested, except that in this case none of the categories are specified in advance.

Suggested Citation

  • Michael J. Wise, 2000. "Protein annotators' assistant: A novel application of information retrieval techniques," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 51(12), pages 1131-1136.
  • Handle: RePEc:bla:jamest:v:51:y:2000:i:12:p:1131-1136
    DOI: 10.1002/1097-4571(2000)9999:99993.0.CO;2-F
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/1097-4571(2000)9999:99993.0.CO;2-F
    Download Restriction: no

    File URL: https://libkey.io/10.1002/1097-4571(2000)9999:99993.0.CO;2-F?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jamest:v:51:y:2000:i:12:p:1131-1136. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.