IDEAS home Printed from https://ideas.repec.org/a/bla/jamest/v24y1973i4p246-260.html
   My bibliography  Save this article

Document retrieval experiments using cluster analysis

Author

Listed:
  • Jack Minker
  • Eero Peltola
  • Gerald A. Wilson

Abstract

The objectives of this paper are to describe the effect of using weighted index terms in a document retrieval system, and to evaluate retrieval performance when queries are expanded by terms occurring in clusters with the query terms. Three data collections, each indexed by several methods, two of which were studied and reported on in previous work, are used to develop explicit results. The study both expands upon and extends previous work at the University of Maryland. The effect of weighting index terms in the document collection, the queries and the formation of clusters is analyzed. Eight cases are investigated in which index terms are weighted and unweighted. The best results are obtained when weighted index terms are used in forming clusters, in queries, and in documents. In this case, the results on the new collection demonstrate a significant improvement in retrieval performance relative to the performance with the unmodified data base, when clustered terms are added to queries. The improvement is in contrast to the results in the previous study, where a degradation in performance, or at best an insignificant improvement, was obtained. Comparisons are made to related work by Sparck‐Jones and her colleagues. This study tends to support the conclusion of Sparck‐Jones that weighted index terms provide better retrieval performance than unweighted terms. The cluster addition of index terms to queries yields unpredictable results. Some collections show an improvement in retrieval performance, others a degradation or no change in performance. Sparck‐Jones obtained an improvement in retrieval performance for her document collection. We conclude that the results are highly dependent upon the document collection, and the technique should be employed with caution.

Suggested Citation

  • Jack Minker & Eero Peltola & Gerald A. Wilson, 1973. "Document retrieval experiments using cluster analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 24(4), pages 246-260, July.
  • Handle: RePEc:bla:jamest:v:24:y:1973:i:4:p:246-260
    DOI: 10.1002/asi.4630240404
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/asi.4630240404
    Download Restriction: no

    File URL: https://libkey.io/10.1002/asi.4630240404?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jamest:v:24:y:1973:i:4:p:246-260. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.