IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0118394.html
   My bibliography  Save this article

Probing the Topological Properties of Complex Networks Modeling Short Written Texts

Author

Listed:
  • Diego R Amancio

Abstract

In recent years, graph theory has been widely employed to probe several language properties. More specifically, the so-called word adjacency model has been proven useful for tackling several practical problems, especially those relying on textual stylistic analysis. The most common approach to treat texts as networks has simply considered either large pieces of texts or entire books. This approach has certainly worked well—many informative discoveries have been made this way—but it raises an uncomfortable question: could there be important topological patterns in small pieces of texts? To address this problem, the topological properties of subtexts sampled from entire books was probed. Statistical analyses performed on a dataset comprising 50 novels revealed that most of the traditional topological measurements are stable for short subtexts. When the performance of the authorship recognition task was analyzed, it was found that a proper sampling yields a discriminability similar to the one found with full texts. Surprisingly, the support vector machine classification based on the characterization of short texts outperformed the one performed with entire books. These findings suggest that a local topological analysis of large documents might improve its global characterization. Most importantly, it was verified, as a proof of principle, that short texts can be analyzed with the methods and concepts of complex networks. As a consequence, the techniques described here can be extended in a straightforward fashion to analyze texts as time-varying complex networks.

Suggested Citation

  • Diego R Amancio, 2015. "Probing the Topological Properties of Complex Networks Modeling Short Written Texts," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-17, February.
  • Handle: RePEc:plo:pone00:0118394
    DOI: 10.1371/journal.pone.0118394
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0118394
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0118394&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0118394?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ranzivelle Marianne Roxas-Villanueva & Maelori Krista Nambatac & Giovanni Tapang, 2012. "Characterizing English Poetic Style Using Complex Networks," International Journal of Modern Physics C (IJMPC), World Scientific Publishing Co. Pte. Ltd., vol. 23(02), pages 1-7.
    2. Amancio, Diego R. & Nunes, Maria G.V. & Oliveira, Osvaldo N. & Costa, Luciano da F., 2012. "Extractive summarization using complex networks and syntactic dependency," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(4), pages 1855-1864.
    3. Diego Raphael Amancio & Cesar Henrique Comin & Dalcimar Casanova & Gonzalo Travieso & Odemir Martinez Bruno & Francisco Aparecido Rodrigues & Luciano da Fontoura Costa, 2014. "A Systematic Comparison of Supervised Classifiers," PLOS ONE, Public Library of Science, vol. 9(4), pages 1-14, April.
    4. J. P. Herrera & P. A. Pury, 2008. "Statistical keyword detection in literary corpora," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 63(1), pages 135-146, May.
    5. Dangalchev, Chavdar, 2004. "Generation models for scale-free networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 338(3), pages 659-671.
    6. Iwona Grabska-Gradzińska & Andrzej Kulig & Jarosław Kwapień & Stanisław Drożdż, 2012. "Complex Network Analysis Of Literary And Scientific Texts," International Journal of Modern Physics C (IJMPC), World Scientific Publishing Co. Pte. Ltd., vol. 23(07), pages 1-15.
    7. repec:uts:ppaper:2013:3 is not listed on IDEAS
    8. Maryam Ebrahimpour & Tālis J Putniņš & Matthew J Berryman & Andrew Allison & Brian W-H Ng & Derek Abbott, 2013. "Automated Authorship Attribution Using Advanced Signal Classification Techniques," PLOS ONE, Public Library of Science, vol. 8(2), pages 1-12, February.
    9. Carretero-Campos, C. & Bernaola-Galván, P. & Coronado, A.V. & Carpena, P., 2013. "Improving statistical keyword detection in short texts: Entropic and clustering approaches," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 392(6), pages 1481-1492.
    10. Ramon Ferrer i Cancho & Ricard V. Solé, 2001. "The Small-World of Human Language," Working Papers 01-03-016, Santa Fe Institute.
    11. Paczuski, M. & Hughes, D., 2004. "A heavenly example of scale-free networks and self-organized criticality," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 342(1), pages 158-163.
    12. Amancio, D.R. & Oliveira, O.N. & Costa, L. da F., 2011. "On the concepts of complex networks to quantify the difficulty in finding the way out of labyrinths," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(23), pages 4673-4683.
    13. Ranzivelle Marianne Roxas & Giovanni Tapang, 2010. "Prose And Poetry Classification And Boundary Detection Using Word Adjacency Network Analysis," International Journal of Modern Physics C (IJMPC), World Scientific Publishing Co. Pte. Ltd., vol. 21(04), pages 503-512.
    14. Liu, Haitao, 2008. "The complexity of Chinese syntactic dependency networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 387(12), pages 3048-3058.
    15. Efstathios Stamatatos, 2009. "A survey of modern authorship attribution methods," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(3), pages 538-556, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Samuel Zanferdini Oliva & Livia Oliveira-Ciabati & Denise Gazotto Dezembro & Mário Sérgio Adolfi Júnior & Maísa Carvalho Silva & Hugo Cesar Pessotti & Juliana Tarossi Pollettini, 2021. "Text structuring methods based on complex network: a systematic review," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1471-1493, February.
    2. de Arruda, Henrique F. & Silva, Filipi N. & Comin, Cesar H. & Amancio, Diego R. & Costa, Luciano da F., 2019. "Connecting network science and information theory," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 515(C), pages 641-648.
    3. Diego Raphael Amancio, 2015. "Comparing the topological properties of real and artificially generated scientific manuscripts," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1763-1779, December.
    4. Corrêa, Edilson A. & Amancio, Diego R., 2019. "Word sense induction using word embeddings and community detection in complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 523(C), pages 180-190.
    5. Corrêa, Edilson A. & Marinho, Vanessa Q. & Amancio, Diego R., 2020. "Semantic flow in language networks discriminates texts by genre and publication date," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 557(C).
    6. Ferraz de Arruda, Henrique & Reia, Sandro Martinelli & Silva, Filipi Nascimento & Amancio, Diego Raphael & da Fontoura Costa, Luciano, 2022. "Finding contrasting patterns in rhythmic properties between prose and poetry," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 598(C).
    7. de Arruda, Henrique F. & Marinho, Vanessa Q. & Lima, Thales S. & Amancio, Diego R. & Costa, Luciano da F., 2018. "An image analysis approach to text analytics based on complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 510(C), pages 110-120.
    8. Jorge A. V. Tohalino & Laura V. C. Quispe & Diego R. Amancio, 2021. "Analyzing the relationship between text features and grants productivity," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4255-4275, May.
    9. Quispe, Laura V.C. & Tohalino, Jorge A.V. & Amancio, Diego R., 2021. "Using virtual edges to improve the discriminability of co-occurrence text networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 562(C).
    10. Akimushkin, Camilo & Amancio, Diego R. & Oliveira, Osvaldo N., 2018. "On the role of words in the network structure of texts: Application to authorship attribution," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 495(C), pages 49-58.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ghosh, Dipak & Chakraborty, Sayantan & Samanta, Shukla, 2019. "Study of translational effect in Tagore’s Gitanjali using Chaos based Multifractal analysis technique," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 523(C), pages 1343-1354.
    2. Tohalino, Jorge V. & Amancio, Diego R., 2018. "Extractive multi-document summarization using multilayer networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 526-539.
    3. Diego Raphael Amancio, 2015. "Comparing the topological properties of real and artificially generated scientific manuscripts," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1763-1779, December.
    4. Jamaati, Maryam & Mehri, Ali, 2018. "Text mining by Tsallis entropy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 490(C), pages 1368-1376.
    5. Mehri, Ali & Agahi, Hamzeh & Mehri-Dehnavi, Hossein, 2019. "A novel word ranking method based on distorted entropy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 521(C), pages 484-492.
    6. Amancio, Diego R. & Oliveira Jr., Osvaldo N. & Costa, Luciano da F., 2012. "Structure–semantics interplay in complex networks and its effects on the predictability of similarity in texts," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(18), pages 4406-4419.
    7. Haoran Zhu & Lei Lei, 2022. "The Research Trends of Text Classification Studies (2000–2020): A Bibliometric Analysis," SAGE Open, , vol. 12(2), pages 21582440221, April.
    8. Sheng, Long & Li, Chunguang, 2009. "English and Chinese languages as weighted complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 388(12), pages 2561-2570.
    9. de Arruda, Henrique F. & Marinho, Vanessa Q. & Lima, Thales S. & Amancio, Diego R. & Costa, Luciano da F., 2018. "An image analysis approach to text analytics based on complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 510(C), pages 110-120.
    10. Woon Peng Goh & Kang-Kwong Luke & Siew Ann Cheong, 2018. "Functional shortcuts in language co-occurrence networks," PLOS ONE, Public Library of Science, vol. 13(9), pages 1-18, September.
    11. Silva, Filipi N. & Amancio, Diego R. & Bardosova, Maria & Costa, Luciano da F. & Oliveira, Osvaldo N., 2016. "Using network science and text analytics to produce surveys in a scientific topic," Journal of Informetrics, Elsevier, vol. 10(2), pages 487-502.
    12. Bian, Tian & Hu, Jiantao & Deng, Yong, 2017. "Identifying influential nodes in complex networks based on AHP," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 479(C), pages 422-436.
    13. Liang, Wei & Shi, Yuming & Huang, Qiuling, 2014. "Modeling the Chinese language as an evolving network," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 393(C), pages 268-276.
    14. Ted Briscoe, 2008. "Language learning, power laws, and sexual selection," Mind & Society: Cognitive Studies in Economics and Social Sciences, Springer;Fondazione Rosselli, vol. 7(1), pages 65-76, June.
    15. Bahrami, Mohammad & Chinichian, Narges & Hosseiny, Ali & Jafari, Gholamreza & Ausloos, Marcel, 2020. "Optimization of the post-crisis recovery plans in scale-free networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 540(C).
    16. Priscila T M Saito & Rodrigo Y M Nakamura & Willian P Amorim & João P Papa & Pedro J de Rezende & Alexandre X Falcão, 2015. "Choosing the Most Effective Pattern Classification Model under Learning-Time Constraint," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-23, June.
    17. Nils-Axel M?rner, 2018. "Evaluation of the Performance and Efficiency of the Automated Linguistic Features for Author Identification in Short Text Messages Using Different Variable Selection Techniques," Studies in Media and Communication, Redfame publishing, vol. 6(2), pages 83-102, December.
    18. Maryam Ebrahimpour & Tālis J Putniņš & Matthew J Berryman & Andrew Allison & Brian W-H Ng & Derek Abbott, 2013. "Automated Authorship Attribution Using Advanced Signal Classification Techniques," PLOS ONE, Public Library of Science, vol. 8(2), pages 1-12, February.
    19. Mariane Barros Neiva & Patrick Guidotti & Odemir Martinez Bruno, 2018. "Enhancing LBP by preprocessing via anisotropic diffusion," International Journal of Modern Physics C (IJMPC), World Scientific Publishing Co. Pte. Ltd., vol. 29(08), pages 1-29, August.
    20. Yu, Shuiyuan & Liu, Haitao & Xu, Chunshan, 2011. "Statistical properties of Chinese phonemic networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(7), pages 1370-1380.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0118394. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.