IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v14y2022i10p272-d923732.html
   My bibliography  Save this article

Author Identification from Literary Articles with Visual Features: A Case Study with Bangla Documents

Author

Listed:
  • Ankita Dhar

    (Department of Computational Science, Brainware University, Kolkata 700125, India)

  • Himadri Mukherjee

    (Department of Computer Science, West Bengal State University, Kolkata 700126, India)

  • Shibaprasad Sen

    (Techno Main Saltlake, Kolkata 700091, India)

  • Md Obaidullah Sk

    (Department of Computer Science and Engineering, Aliah University, Kolkata 700156, India)

  • Amitabha Biswas

    (Department of Computer Science, West Bengal State University, Kolkata 700126, India)

  • Teresa Gonçalves

    (Department of Computer Science, University of Évora, 7000-671 Évora, Portugal
    ALGORITMI Research Center, Vista Lab, University of Évora, 7000-671 Évora, Portugal)

  • Kaushik Roy

    (Department of Computer Science, West Bengal State University, Kolkata 700126, India)

Abstract

Author identification is an important aspect of literary analysis, studied in natural language processing (NLP). It aids identify the most probable author of articles, news texts or social media comments and tweets, for example. It can be applied to other domains such as criminal and civil cases, cybersecurity, forensics, identification of plagiarizer, and many more. An automated system in this context can thus be very beneficial for society. In this paper, we propose a convolutional neural network (CNN)-based author identification system from literary articles. This system uses visual features along with a five-layer convolutional neural network for the identification of authors. The prime motivation behind this approach was the feasibility to identify distinct writing styles through a visualization of the writing patterns. Experiments were performed on 1200 articles from 50 authors achieving a maximum accuracy of 93.58%. Furthermore, to see how the system performed on different volumes of data, the experiments were performed on partitions of the dataset. The system outperformed standard handcrafted feature-based techniques as well as established works on publicly available datasets.

Suggested Citation

  • Ankita Dhar & Himadri Mukherjee & Shibaprasad Sen & Md Obaidullah Sk & Amitabha Biswas & Teresa Gonçalves & Kaushik Roy, 2022. "Author Identification from Literary Articles with Visual Features: A Case Study with Bangla Documents," Future Internet, MDPI, vol. 14(10), pages 1-20, September.
  • Handle: RePEc:gam:jftint:v:14:y:2022:i:10:p:272-:d:923732
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/14/10/272/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/14/10/272/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Moshe Koppel & Jonathan Schler & Shlomo Argamon, 2009. "Computational methods in authorship attribution," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(1), pages 9-26, January.
    2. Andi Rexha & Mark Kröll & Hermann Ziak & Roman Kern, 2018. "Authorship identification of documents with high content similarity," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(1), pages 223-237, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zeev Volkovich, 2020. "A Short-Patterning of the Texts Attributed to Al Ghazali: A “Twitter Look” at the Problem," Mathematics, MDPI, vol. 8(11), pages 1-16, November.
    2. Jaroslav Ráček & Jan Ministr, 2014. "Tools for Automatic Recognition of Persons and their Relationships in Unstructured Data [Nástroje pro automatické rozpoznávání entit a jejich vztahů v nestrukturovaných textech]," Acta Informatica Pragensia, Prague University of Economics and Business, vol. 2014(3), pages 280-287.
    3. Matthew J. Schneider & Shawn Mankad, 2021. "A Two-Stage Authorship Attribution Method Using Text and Structured Data for De-Anonymizing User-Generated Content," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 8(3), pages 66-83, September.
    4. Kargin, Vladislav, 2016. "On variation of word frequencies in Russian literary texts," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 445(C), pages 328-334.
    5. Mingfang Wu & David Hawking & Andrew Turpin & Falk Scholer, 2012. "Using anchor text for homepage and topic distillation search tasks," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(6), pages 1235-1255, June.
    6. Tingting Zhang & Baozhen Lee & Qinghua Zhu, 2019. "Semantic measure of plagiarism using a hierarchical graph model," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(1), pages 209-239, October.
    7. Haoran Zhu & Lei Lei, 2022. "The Research Trends of Text Classification Studies (2000–2020): A Bibliometric Analysis," SAGE Open, , vol. 12(2), pages 21582440221, April.
    8. Mike Thelwall, 2017. "Avoiding obscure topics and generalising findings produces higher impact research," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(1), pages 307-320, January.
    9. Chunneng Huang & Tianjun Fu & Hsinchun Chen, 2010. "Text‐based video content classification for online video‐sharing sites," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 61(5), pages 891-906, May.
    10. Silvia Corbara & Alejandro Moreo & Fabrizio Sebastiani, 2023. "Syllabic quantity patterns as rhythmic features for Latin authorship attribution," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(1), pages 128-141, January.
    11. Song, Min & Kim, Erin Hea-Jin & Kim, Ha Jin, 2015. "Exploring author name disambiguation on PubMed-scale," Journal of Informetrics, Elsevier, vol. 9(4), pages 924-941.
    12. Mark Bukowski & Sandra Geisler & Thomas Schmitz-Rode & Robert Farkas, 2020. "Feasibility of activity-based expert profiling using text mining of scientific publications and patents," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 579-620, May.
    13. de Arruda, Henrique F. & Marinho, Vanessa Q. & Lima, Thales S. & Amancio, Diego R. & Costa, Luciano da F., 2018. "An image analysis approach to text analytics based on complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 510(C), pages 110-120.
    14. Stefano Sbalchiero & Maria Stella Righettini, 2017. "Rhetorical manifestation of institutional transformation," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(3), pages 1279-1296, May.
    15. Guillaume Cabanac & Ingo Frommholz & Philipp Mayr, 2018. "Bibliometric-enhanced information retrieval: preface," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1225-1227, August.
    16. Refat Aljumily, 2015. "Hierarchical and Non-Hierarchical Linear and Non-Linear Clustering Methods to “Shakespeare Authorship Question”," Social Sciences, MDPI, vol. 4(3), pages 1-42, September.
    17. Gordon J. Ross, 2020. "Tracking the evolution of literary style via Dirichlet–multinomial change point regression," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(1), pages 149-167, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:14:y:2022:i:10:p:272-:d:923732. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.