IDEAS home Printed from https://ideas.repec.org/a/ids/ijdmmm/v12y2020i3p350-364.html
   My bibliography  Save this article

Performance of authorship attribution classifiers with short texts: application of religious Arabic fatwas

Author

Listed:
  • Mohammed Al-Sarem
  • Abdel-Hamid Emara
  • Ahmed Abdel Wahab

Abstract

Although authorship attribution is a well-known problem in authorship analysis domain, researches on Arabic contexts are still limited. In addition, examining the performance of the attribution methods on training set with short textual documents is also not considered well in other languages, such as English, Chinese, Spanish and Dutch. Therefore, this current work aims at examining the performance of attribution classifiers in the context of short Arabic textual documents. The experimental part of this work is conducted with well-known classifiers namely: decision tree C4.5 method, naive Bayes model, K-NN method, Markov model, SMO and Burrows Delta method. We experiment with various features combination. The results show that combining the word-based lexical features with the structural features yields the best accuracy. At this end, we use this combination as a baseline for further investigation. We also examine the effect of combining the n-gram features. The results indicate that some classifiers show an improvement while the others do not. In addition, the results show that the naive Bayes method gives the highest accuracy among all the attribution classifiers.

Suggested Citation

  • Mohammed Al-Sarem & Abdel-Hamid Emara & Ahmed Abdel Wahab, 2020. "Performance of authorship attribution classifiers with short texts: application of religious Arabic fatwas," International Journal of Data Mining, Modelling and Management, Inderscience Enterprises Ltd, vol. 12(3), pages 350-364.
  • Handle: RePEc:ids:ijdmmm:v:12:y:2020:i:3:p:350-364
    as

    Download full text from publisher

    File URL: http://www.inderscience.com/link.php?id=108719
    Download Restriction: Access to full text is restricted to subscribers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ids:ijdmmm:v:12:y:2020:i:3:p:350-364. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sarah Parker (email available below). General contact details of provider: http://www.inderscience.com/browse/index.php?journalID=342 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.