IDEAS home Printed from https://ideas.repec.org/p/sek/iacpro/6409186.html
   My bibliography  Save this paper

Improve Naïve Bayesian Classifier by Using Genetic Algorithm for Arabic Document

Author

Listed:
  • Farah Zawaideh

    (Irbid National University)

  • Raed Sahawneh

    (Irbid National University)

Abstract

Automatic text categorization (TC) has become one of the most interesting fields for researchers in data mining, information retrieval, web text mining, as well as natural language processing paradigms due to the vast number of new documents being retrieved for various information retrieval systems. This paper proposes a new TC technique, which classifies Arabic language text documents using the naïve Bayesian classifier attached to a genetic algorithm, model; this algorithm classifies documents by generating a random sample of chromosomes that represent documents in the corpus. The developed model aims to enhance the work of naïve Bayesian classifier through applying the genetic algorithm model. Experiment results show that the precision and recall are increased when testing higher number of documents; the precision was ranged from 0.8 to 0.97 for different testing environment; the number of genes that is placed in every chromosome is also tested and experiments show that the best value for the number of genes is 50 genes

Suggested Citation

  • Farah Zawaideh & Raed Sahawneh, 2018. "Improve Naïve Bayesian Classifier by Using Genetic Algorithm for Arabic Document," Proceedings of International Academic Conferences 6409186, International Institute of Social and Economic Sciences.
  • Handle: RePEc:sek:iacpro:6409186
    as

    Download full text from publisher

    File URL: https://iises.net/proceedings/35th-international-academic-conference-barcelona-spain/table-of-content/detail?cid=64&iid=054&rid=9186
    File Function: First version, 2018
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    Data mining; Text classification; Genetic algorithm; Naïve Bayesian Classifier; N-gram processing;
    All these keywords.

    JEL classification:

    • C80 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sek:iacpro:6409186. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Klara Cermakova (email available below). General contact details of provider: https://iises.net/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.