IDEAS home Printed from https://ideas.repec.org/a/spr/aodasc/v11y2024i1d10.1007_s40745-022-00434-4.html
   My bibliography  Save this article

Context-Based Bigram Model for POS Tagging in Hindi: A Heuristic Approach

Author

Listed:
  • Santosh Kumar Bharti

    (Pandit Deendayal Energy University)

  • Rajeev Kumar Gupta

    (Pandit Deendayal Energy University)

  • Samir Patel

    (Pandit Deendayal Energy University)

  • Manan Shah

    (Pandit Deendayal Energy University)

Abstract

In the domain of natural language processing, part-of-speech (POS) tagging is the most important task. It plays a vital role in applications like sentiment analysis, text summarization, opinion mining, etc. POS tagging is a process of assigning POS information (noun, pronoun, verb, etc.) to the given word. This information is considered in the context of their relationship with the surrounding words. Hindi is very popular language in countries like India, Nepal, United States, Mauritius, etc. Majority of Indians are accustomed to Hindi for reading and writing. They also use Hindi for writing on social media such as Twitter, Facebook, WhatsApp, etc. POS tagging is the most important phase to analyze these Hindi text from social media. The text scripted in Hindi is ambiguous in nature and rich in morphology. It makes identification of POS information challenging. In this article, a heuristic based approach is proposed for identifying POS information. The proposed method deployed a context-based bigram model that create a bigram sequence based on the relationship with the adjacent words. Subsequently, it selects the most likelihood POS information for a word based on both the forward and reverse bigram sequences. The experimental result of the proposed heuristic approach is compared with existing state-of-the-art techniques like hidden Markov model, decision tree, conditional random fields, support vector machine, neural network, and recurrent neural networks. Finally, it is observe that the proposed heuristic approach for POS tagging in Hindi outperforms the existing techniques and attains an accuracy of 94.3%.

Suggested Citation

  • Santosh Kumar Bharti & Rajeev Kumar Gupta & Samir Patel & Manan Shah, 2024. "Context-Based Bigram Model for POS Tagging in Hindi: A Heuristic Approach," Annals of Data Science, Springer, vol. 11(1), pages 347-378, February.
  • Handle: RePEc:spr:aodasc:v:11:y:2024:i:1:d:10.1007_s40745-022-00434-4
    DOI: 10.1007/s40745-022-00434-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s40745-022-00434-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s40745-022-00434-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:aodasc:v:11:y:2024:i:1:d:10.1007_s40745-022-00434-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.