IDEAS home Printed from https://ideas.repec.org/a/hin/jnlmpe/8048335.html
   My bibliography  Save this article

Naive Bayesian Prediction of Japanese Annotated Corpus for Textual Semantic Word Formation Classification

Author

Listed:
  • Zhoushao Hao
  • Gengxin Sun

Abstract

With the rapid development of Japanese information processing technology, problems such as polysemy and ambiguity at the text and dialogue level, as well as unregistered words, have become increasingly prominent because computers cannot fully “understand†the semantics of words. How to make the computer “understand†the semantics of words accurately requires the computer to “understand†the rules of converting and integrating words into words from the perspective of semantics. Traditional Japanese text classification mostly adopts the text representation method of vector space model, which has the problem of confusing classification effect. Therefore, this paper proposes the topic of constructing a semantic word formation pattern prediction model based on a large-scale annotated corpus. This paper proposes a solution that combines Japanese semantic word formation rules with pattern recognition algorithms. Aiming at this scheme, a variety of pattern recognition algorithms were compared and analyzed, and the naive Bayesian model was decided to predict semantic word formation patterns. This paper further improves the accuracy of computer prediction of Japanese semantic word formation patterns by adding part of speech. Before modeling, the parts of speech of words are automatically tagged and manually checked based on the original annotated corpus. In the research on predicting Japanese semantic word formation patterns, this paper builds a semantic word formation pattern prediction model based on Naive Bayes and conducts simulation experiments. We divide the eight types of semantic word formation patterns in the annotated corpus into two groups, and divide the obtained sample sets into training sets and test sets, so that the Naive Bayes model first learns semantic word formation rules based on the training sets of each group. Semantic word formation patterns are predicted on the test set for each group. The simulation results show that the prediction model of semantic word formation mode has a generally high degree of fit and prediction accuracy. The prediction model of semantic word formation pattern based on this theory can ensure that the computer can judge the semantic word formation pattern more accurately.

Suggested Citation

  • Zhoushao Hao & Gengxin Sun, 2022. "Naive Bayesian Prediction of Japanese Annotated Corpus for Textual Semantic Word Formation Classification," Mathematical Problems in Engineering, Hindawi, vol. 2022, pages 1-14, March.
  • Handle: RePEc:hin:jnlmpe:8048335
    DOI: 10.1155/2022/8048335
    as

    Download full text from publisher

    File URL: http://downloads.hindawi.com/journals/mpe/2022/8048335.pdf
    Download Restriction: no

    File URL: http://downloads.hindawi.com/journals/mpe/2022/8048335.xml
    Download Restriction: no

    File URL: https://libkey.io/10.1155/2022/8048335?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hin:jnlmpe:8048335. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Mohamed Abdelhakeem (email available below). General contact details of provider: https://www.hindawi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.