IDEAS home Printed from https://ideas.repec.org/a/bla/jamist/v52y2001i5p378-390.html
   My bibliography  Save this article

Automatic cataloguing and searching for retrospective data by use of OCR text

Author

Listed:
  • Yuen‐Hsien Tseng

Abstract

This article describes our efforts in supporting information retrieval from OCR degraded text. In particular, we report our approach to an automatic cataloging and searching contest for books in multiple languages. In this contest, 500 books in English, German, French, and Italian published during the 1770s to 1970s are scanned into images and OCRed to digital text. The goal is to use only automatic ways to extract information for sophisticated searching. We adopted the vector space retrieval model, an n‐gram indexing method, and a special weighting scheme to tackle this problem. Although the performance by this approach is slightly inferior to the best approach, which is mainly based on regular expression match, one advantage of our approach is that it is less language dependent and less layout sensitive, thus is readily applicable to other languages and document collections. Problems of OCR text retrieval for some Asian languages are also discussed in this article, and solutions are suggested.

Suggested Citation

  • Yuen‐Hsien Tseng, 2001. "Automatic cataloguing and searching for retrospective data by use of OCR text," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 52(5), pages 378-390.
  • Handle: RePEc:bla:jamist:v:52:y:2001:i:5:p:378-390
    DOI: 10.1002/1532-2890(2001)9999:99993.0.CO;2-A
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/1532-2890(2001)9999:99993.0.CO;2-A
    Download Restriction: no

    File URL: https://libkey.io/10.1002/1532-2890(2001)9999:99993.0.CO;2-A?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jamist:v:52:y:2001:i:5:p:378-390. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.