IDEAS home Printed from https://ideas.repec.org/a/igg/jkss00/v9y2018i2p45-69.html
   My bibliography  Save this article

Authorship Attribution of Noisy Text Data With a Comparative Study of Clustering Methods

Author

Listed:
  • Zohra Hamadache

    (USTHB University, Bab Ezzouar, Algeria)

  • Halim Sayoud

    (USTHB University, Bab Ezzouar, Algeria)

Abstract

Through the fast development and intensification of the large volume of data via the internet, visual analytics (VA) comes out with the intention of visualizing multidimensional data in different ways, which reveals interesting information about the data, making them clearer and more intelligible. In this investigation, the authors focused on the VA based Authorship Attribution (AA) task, applied on noisy text data. Furthermore, this article proposes 3D Visual Analytics technique based on sphere implementation. The used dataset contains several text documents written by 5 American Philosophers, with an average length of 850 words per text, which were scanned and then corrupted with different noise levels. The obtained results show that the hierarchical clustering technique using a fully-automated threshold, presents high performance in terms of authorship attribution accuracy, especially with character trigrams and ending bigrams, where the clustering recognition rate (CRR) reaches an accuracy of 100% at noise levels: from 0% to 7%. In addition, the proposed 3D sphere technique appears quite interesting by showing high clustering performances, mainly with Words.

Suggested Citation

  • Zohra Hamadache & Halim Sayoud, 2018. "Authorship Attribution of Noisy Text Data With a Comparative Study of Clustering Methods," International Journal of Knowledge and Systems Science (IJKSS), IGI Global, vol. 9(2), pages 45-69, April.
  • Handle: RePEc:igg:jkss00:v:9:y:2018:i:2:p:45-69
    as

    Download full text from publisher

    File URL: http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/IJKSS.2018040103
    Download Restriction: no
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:igg:jkss00:v:9:y:2018:i:2:p:45-69. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Journal Editor (email available below). General contact details of provider: https://www.igi-global.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.