IDEAS home Printed from https://ideas.repec.org/a/bla/jamist/v57y2006i7p891-906.html
   My bibliography  Save this article

Automatic generation of Japanese–English bilingual thesauri based on bilingual corpora

Author

Listed:
  • Keita Tsuji
  • Kyo Kageura

Abstract

The authors propose a method for automatically generating Japanese–English bilingual thesauri based on bilingual corpora. The term bilingual thesaurus refers to a set of bilingual equivalent words and their synonyms. Most of the methods proposed so far for extracting bilingual equivalent word clusters from bilingual corpora depend heavily on word frequency and are not effective for dealing with low‐frequency clusters. These low‐frequency bilingual clusters are worth extracting because they contain many newly coined terms that are in demand but are not listed in existing bilingual thesauri. Assuming that single language‐pair‐independent methods such as frequency‐based ones have reached their limitations and that a language‐pair‐dependent method used in combination with other methods shows promise, the authors propose the following approach: (a) Extract translation pairs based on transliteration patterns; (b) remove the pairs from among the candidate words; (c) extract translation pairs based on word frequency from the remaining candidate words; and (d) generate bilingual clusters based on the extracted pairs using a graph‐theoretic method. The proposed method has been found to be significantly more effective than other methods.

Suggested Citation

  • Keita Tsuji & Kyo Kageura, 2006. "Automatic generation of Japanese–English bilingual thesauri based on bilingual corpora," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(7), pages 891-906, May.
  • Handle: RePEc:bla:jamist:v:57:y:2006:i:7:p:891-906
    DOI: 10.1002/asi.20351
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/asi.20351
    Download Restriction: no

    File URL: https://libkey.io/10.1002/asi.20351?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jamist:v:57:y:2006:i:7:p:891-906. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.