IDEAS home Printed from https://ideas.repec.org/a/eee/phsmap/v557y2020ics0378437120304635.html
   My bibliography  Save this article

Semantic flow in language networks discriminates texts by genre and publication date

Author

Listed:
  • Corrêa, Edilson A.
  • Marinho, Vanessa Q.
  • Amancio, Diego R.

Abstract

We propose a framework to characterize documents based on their semantic flow. The proposed framework encompasses a network-based model that connected sentences based on their semantic similarity. Semantic fields are detected using standard community detection methods. As the story unfolds, transitions between semantic fields are represented in Markov networks, which in turn are characterized via network motifs (subgraphs). Here we show that different book characteristics (such as genre and publication date) are discriminated by the adopted semantic flow representation. Remarkably, even without a systematic optimization of parameters, philosophy and investigative books were discriminated with an accuracy rate of 92.5%. While the objective of this study is not to create a text classification method, we believe that semantic flow features could be used in traditional network-based models of texts that capture only syntactical/stylistic information to improve the characterization of texts.

Suggested Citation

  • Corrêa, Edilson A. & Marinho, Vanessa Q. & Amancio, Diego R., 2020. "Semantic flow in language networks discriminates texts by genre and publication date," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 557(C).
  • Handle: RePEc:eee:phsmap:v:557:y:2020:i:c:s0378437120304635
    DOI: 10.1016/j.physa.2020.124895
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0378437120304635
    Download Restriction: Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

    File URL: https://libkey.io/10.1016/j.physa.2020.124895?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Mayra Z Rodriguez & Cesar H Comin & Dalcimar Casanova & Odemir M Bruno & Diego R Amancio & Luciano da F Costa & Francisco A Rodrigues, 2019. "Clustering algorithms: A comparative approach," PLOS ONE, Public Library of Science, vol. 14(1), pages 1-34, January.
    2. Marcelo A Montemurro & Damián H Zanette, 2013. "Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis," PLOS ONE, Public Library of Science, vol. 8(6), pages 1-9, June.
    3. de Arruda, Henrique F. & Silva, Filipi N. & Comin, Cesar H. & Amancio, Diego R. & Costa, Luciano da F., 2019. "Connecting network science and information theory," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 515(C), pages 641-648.
    4. Diego R Amancio, 2015. "Probing the Topological Properties of Complex Networks Modeling Short Written Texts," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-17, February.
    5. Ebeling, Werner & Neiman, Alexander, 1995. "Long-range correlations between letters and sentences in texts," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 215(3), pages 233-241.
    6. Corrêa Jr., Edilson A. & Silva, Filipi N. & da F. Costa, Luciano & Amancio, Diego R., 2017. "Patterns of authors contribution in scientific manuscripts," Journal of Informetrics, Elsevier, vol. 11(2), pages 498-510.
    7. Michele Tumminello & Salvatore Miccichè & Fabrizio Lillo & Jyrki Piilo & Rosario N Mantegna, 2011. "Statistically Validated Networks in Bipartite Complex Systems," PLOS ONE, Public Library of Science, vol. 6(3), pages 1-11, March.
    8. Diego Raphael Amancio, 2015. "Comparing the topological properties of real and artificially generated scientific manuscripts," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1763-1779, December.
    9. Silva, Filipi N. & Amancio, Diego R. & Bardosova, Maria & Costa, Luciano da F. & Oliveira, Osvaldo N., 2016. "Using network science and text analytics to produce surveys in a scientific topic," Journal of Informetrics, Elsevier, vol. 10(2), pages 487-502.
    10. Corrêa, Edilson A. & Amancio, Diego R., 2019. "Word sense induction using word embeddings and community detection in complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 523(C), pages 180-190.
    11. Tohalino, Jorge V. & Amancio, Diego R., 2018. "Extractive multi-document summarization using multilayer networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 526-539.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Stefan Claus & Massimo Stella, 2022. "Natural Language Processing and Cognitive Networks Identify UK Insurers’ Trends in Investor Day Transcripts," Future Internet, MDPI, vol. 14(10), pages 1-18, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jorge A. V. Tohalino & Laura V. C. Quispe & Diego R. Amancio, 2021. "Analyzing the relationship between text features and grants productivity," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4255-4275, May.
    2. Tohalino, Jorge A.V. & Amancio, Diego R., 2022. "On predicting research grants productivity via machine learning," Journal of Informetrics, Elsevier, vol. 16(2).
    3. de Arruda, Henrique F. & Silva, Filipi N. & Comin, Cesar H. & Amancio, Diego R. & Costa, Luciano da F., 2019. "Connecting network science and information theory," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 515(C), pages 641-648.
    4. Brito, Ana C.M. & Silva, Filipi N. & de Arruda, Henrique F. & Comin, Cesar H. & Amancio, Diego R. & Costa, Luciano da F., 2021. "Classification of abrupt changes along viewing profiles of scientific articles," Journal of Informetrics, Elsevier, vol. 15(2).
    5. Quispe, Laura V.C. & Tohalino, Jorge A.V. & Amancio, Diego R., 2021. "Using virtual edges to improve the discriminability of co-occurrence text networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 562(C).
    6. Ana C. M. Brito & Filipi N. Silva & Diego R. Amancio, 2023. "Analyzing the influence of prolific collaborations on authors productivity and visibility," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(4), pages 2471-2487, April.
    7. Samuel Zanferdini Oliva & Livia Oliveira-Ciabati & Denise Gazotto Dezembro & Mário Sérgio Adolfi Júnior & Maísa Carvalho Silva & Hugo Cesar Pessotti & Juliana Tarossi Pollettini, 2021. "Text structuring methods based on complex network: a systematic review," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1471-1493, February.
    8. Corrêa, Edilson A. & Amancio, Diego R., 2019. "Word sense induction using word embeddings and community detection in complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 523(C), pages 180-190.
    9. Guerreiro, Lucas & Silva, Filipi N. & Amancio, Diego R., 2024. "Recovering network topology and dynamics from sequences: A machine learning approach," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 638(C).
    10. Jeong, Yoo Kyung & Xie, Qing & Yan, Erjia & Song, Min, 2020. "Examining drug and side effect relation using author–entity pair bipartite networks," Journal of Informetrics, Elsevier, vol. 14(1).
    11. Dejian Yu & Wanru Wang & Shuai Zhang & Wenyu Zhang & Rongyu Liu, 2017. "Hybrid self-optimized clustering model based on citation links and textual features to detect research topics," PLOS ONE, Public Library of Science, vol. 12(10), pages 1-21, October.
    12. Ferraz de Arruda, Henrique & Reia, Sandro Martinelli & Silva, Filipi Nascimento & Amancio, Diego Raphael & da Fontoura Costa, Luciano, 2022. "Finding contrasting patterns in rhythmic properties between prose and poetry," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 598(C).
    13. Brito, Ana C.M. & Silva, Filipi N. & Amancio, Diego R., 2021. "Associations between author-level metrics in subsequent time periods," Journal of Informetrics, Elsevier, vol. 15(4).
    14. de Arruda, Henrique F. & Marinho, Vanessa Q. & Lima, Thales S. & Amancio, Diego R. & Costa, Luciano da F., 2018. "An image analysis approach to text analytics based on complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 510(C), pages 110-120.
    15. Xiomara S. Q. Chacon & Thiago C. Silva & Diego R. Amancio, 2020. "Comparing the impact of subfields in scientific journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 625-639, October.
    16. Tohalino, Jorge V. & Amancio, Diego R., 2018. "Extractive multi-document summarization using multilayer networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 526-539.
    17. Han, Rui-Qi & Li, Ming-Xia & Chen, Wei & Zhou, Wei-Xing & Stanley, H. Eugene, 2019. "Structural properties of statistically validated empirical information networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 523(C), pages 747-756.
    18. Andrea Flori & Fabrizio Lillo & Fabio Pammolli & Alessandro Spelta, 2021. "Better to stay apart: asset commonality, bipartite network centrality, and investment strategies," Annals of Operations Research, Springer, vol. 299(1), pages 177-213, April.
    19. Adele Ravagnani & Fabrizio Lillo & Paola Deriu & Piero Mazzarisi & Francesca Medda & Antonio Russo, 2024. "Dimensionality reduction techniques to support insider trading detection," Papers 2403.00707, arXiv.org, revised May 2024.
    20. Fernandez Martinez, Roberto & Lostado Lorza, Ruben & Santos Delgado, Ana Alexandra & Piedra, Nelson, 2021. "Use of classification trees and rule-based models to optimize the funding assignment to research projects: A case study of UTPL," Journal of Informetrics, Elsevier, vol. 15(1).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:phsmap:v:557:y:2020:i:c:s0378437120304635. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/physica-a-statistical-mechpplications/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.