IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0054998.html

Automated Authorship Attribution Using Advanced Signal Classification Techniques

Author

Listed:
  • Maryam Ebrahimpour
  • Tālis J Putniņš
  • Matthew J Berryman
  • Andrew Allison
  • Brian W-H Ng
  • Derek Abbott

Abstract

In this paper, we develop two automated authorship attribution schemes, one based on Multiple Discriminant Analysis (MDA) and the other based on a Support Vector Machine (SVM). The classification features we exploit are based on word frequencies in the text. We adopt an approach of preprocessing each text by stripping it of all characters except a-z and space. This is in order to increase the portability of the software to different types of texts. We test the methodology on a corpus of undisputed English texts, and use leave-one-out cross validation to demonstrate classification accuracies in excess of 90%. We further test our methods on the Federalist Papers, which have a partly disputed authorship and a fair degree of scholarly consensus. And finally, we apply our methodology to the question of the authorship of the Letter to the Hebrews by comparing it against a number of original Greek texts of known authorship. These tests identify where some of the limitations lie, motivating a number of open questions for future work. An open source implementation of our methodology is freely available for use at https://github.com/matthewberryman/author-detection.

Suggested Citation

  • Maryam Ebrahimpour & Tālis J Putniņš & Matthew J Berryman & Andrew Allison & Brian W-H Ng & Derek Abbott, 2013. "Automated Authorship Attribution Using Advanced Signal Classification Techniques," PLOS ONE, Public Library of Science, vol. 8(2), pages 1-12, February.
  • Handle: RePEc:plo:pone00:0054998
    DOI: 10.1371/journal.pone.0054998
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0054998
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0054998&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0054998?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Other versions of this item:

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Haoran Zhu & Lei Lei, 2022. "The Research Trends of Text Classification Studies (2000–2020): A Bibliometric Analysis," SAGE Open, , vol. 12(2), pages 21582440221, April.
    2. Diego R Amancio, 2015. "Probing the Topological Properties of Complex Networks Modeling Short Written Texts," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-17, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0054998. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.