IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v126y2021i2d10.1007_s11192-020-03785-y.html
   My bibliography  Save this article

Text structuring methods based on complex network: a systematic review

Author

Listed:
  • Samuel Zanferdini Oliva

    (Kidopi Soluções em Informática Ltda)

  • Livia Oliveira-Ciabati

    (Kidopi Soluções em Informática Ltda)

  • Denise Gazotto Dezembro

    (Kidopi Soluções em Informática Ltda)

  • Mário Sérgio Adolfi Júnior

    (Kidopi Soluções em Informática Ltda)

  • Maísa Carvalho Silva

    (Kidopi Soluções em Informática Ltda)

  • Hugo Cesar Pessotti

    (Kidopi Soluções em Informática Ltda)

  • Juliana Tarossi Pollettini

    (Kidopi Soluções em Informática Ltda)

Abstract

Currently, there is a large amount of text being shared through the Internet. These texts are available in different forms—structured, unstructured and semi structured. There are different ways of analyzing texts, but domain experts usually divide this process in some steps such as pre-processing, feature extraction and a final step that could be classification, clustering, summarization, and keyword extraction, depending on the purpose over the text. For this processing, several approaches have been proposed in the literature based on variations of methods like artificial neural network and deep learning. In this paper, we conducted a systematic review of papers dealing with the use of complex networks approaches for the process of analyzing text. The main results showed that complex network topological properties, measures and modeling can capture and identify text structures concerning different purposes such as text analysis, classification, topic and keyword extraction, and summarization. We conclude that complex network topological properties provide promising strategies with respect of processing texts, considering their different aspects and structures.

Suggested Citation

  • Samuel Zanferdini Oliva & Livia Oliveira-Ciabati & Denise Gazotto Dezembro & Mário Sérgio Adolfi Júnior & Maísa Carvalho Silva & Hugo Cesar Pessotti & Juliana Tarossi Pollettini, 2021. "Text structuring methods based on complex network: a systematic review," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1471-1493, February.
  • Handle: RePEc:spr:scient:v:126:y:2021:i:2:d:10.1007_s11192-020-03785-y
    DOI: 10.1007/s11192-020-03785-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-020-03785-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-020-03785-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Sana'a A. Alwidian & Hani A. Bani-Salameh & Ala'a N. Alslaity, 2015. "Text data mining: a proposed framework and future perspectives," International Journal of Business Information Systems, Inderscience Enterprises Ltd, vol. 18(2), pages 127-140.
    2. Tachimori, Yutaka & Iwanaga, Hiroaki & Tahara, Takashi, 2013. "The networks from medical knowledge and clinical practice have small-world, scale-free, and hierarchical features," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 392(23), pages 6084-6089.
    3. D. R. Amancio & M. G. V. Nunes & O. N. Oliveira & L. F. Costa, 2012. "Using complex networks concepts to assess approaches for citations in scientific papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(3), pages 827-842, June.
    4. Liu Yang & Keping Li & Dan Zhao & Shuang Gu & Dongyang Yan, 2019. "A Network Method for Identifying the Root Cause of High-Speed Rail Faults Based on Text Data," Energies, MDPI, vol. 12(10), pages 1-17, May.
    5. Liu Yang & Keping Li & Hangfei Huang, 2018. "A new network model for extracting text keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 339-361, July.
    6. Prabin Kumar Panigrahi & Nishikant Bele, 2016. "A review of recent advances in text mining of Indian languages," International Journal of Business Information Systems, Inderscience Enterprises Ltd, vol. 23(2), pages 175-193.
    7. Ke, Xiaohua & Zeng, Yongqiang & Ma, Qinghua & Zhu, Lin, 2014. "Complex dynamics of text analysis," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 415(C), pages 307-314.
    8. K. Sridharan & P. Sivakumar, 2018. "A systematic review on techniques of feature selection and classification for text mining," International Journal of Business Information Systems, Inderscience Enterprises Ltd, vol. 28(4), pages 504-518.
    9. Guan, Qing & An, Haizhong & Li, Huajiao & Hao, Xiaoqing, 2017. "The rapid bi-level exploration on the evolution of regional solar energy development," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 465(C), pages 49-61.
    10. Diego R Amancio, 2015. "Probing the Topological Properties of Complex Networks Modeling Short Written Texts," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-17, February.
    11. Yan, Dongyang & Li, Keping & Ye, Jingjing, 2019. "Correlation analysis of short text based on network model," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 531(C).
    12. Jiang, Jingchi & Zheng, Jichuan & Zhao, Chao & Su, Jia & Guan, Yi & Yu, Qiubin, 2016. "Clinical-decision support based on medical literature: A complex network approach," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 459(C), pages 42-54.
    13. Camilo Akimushkin & Diego Raphael Amancio & Osvaldo Novais Oliveira Jr., 2017. "Text Authorship Identified Using the Dynamics of Word Co-Occurrence Networks," PLOS ONE, Public Library of Science, vol. 12(1), pages 1-15, January.
    14. Ze Wang & Huajiao Li & Renwu Tang, 2019. "Network analysis of coal mine hazards based on text mining and link prediction," International Journal of Modern Physics C (IJMPC), World Scientific Publishing Co. Pte. Ltd., vol. 30(07), pages 1-22, July.
    15. Grimmer, Justin & Stewart, Brandon M., 2013. "Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts," Political Analysis, Cambridge University Press, vol. 21(3), pages 267-297, July.
    16. Li, Huajiao & An, Haizhong & Wang, Yue & Huang, Jiachen & Gao, Xiangyun, 2016. "Evolutionary features of academic articles co-keyword network and keywords co-occurrence network: Based on two-mode affiliation network," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 450(C), pages 657-669.
    17. Silva, Filipi N. & Amancio, Diego R. & Bardosova, Maria & Costa, Luciano da F. & Oliveira, Osvaldo N., 2016. "Using network science and text analytics to produce surveys in a scientific topic," Journal of Informetrics, Elsevier, vol. 10(2), pages 487-502.
    18. Jong Hwan Suh, 2019. "SocialTERM-Extractor: Identifying and Predicting Social-Problem-Specific Key Noun Terms from a Large Number of Online News Articles Using Text Mining and Machine Learning Techniques," Sustainability, MDPI, vol. 11(1), pages 1-44, January.
    19. Jesse Lane & Hak J. Kim, 2015. "Big data: web-crawling and analysing financial news using RapidMiner," International Journal of Business Information Systems, Inderscience Enterprises Ltd, vol. 19(1), pages 41-57.
    20. Woon Peng Goh & Kang-Kwong Luke & Siew Ann Cheong, 2018. "Functional shortcuts in language co-occurrence networks," PLOS ONE, Public Library of Science, vol. 13(9), pages 1-18, September.
    21. A. P. Masucci & G. J. Rodgers, 2009. "Differences Between Normal And Shuffled Texts: Structural Properties Of Weighted Networks," Advances in Complex Systems (ACS), World Scientific Publishing Co. Pte. Ltd., vol. 12(01), pages 113-129.
    22. Dan Zhang & Fan Fan & Sang Do Park, 2019. "Network Analysis of Actors and Policy Keywords for Sustainable Environmental Governance: Focusing on Chinese Environmental Policy," Sustainability, MDPI, vol. 11(15), pages 1-29, July.
    23. Sana Baccar & Mohsen Rouached & Mohamed Abid, 2016. "A capabilities driven model for web services description and composition," International Journal of Business Information Systems, Inderscience Enterprises Ltd, vol. 22(1), pages 26-40.
    24. Basavaraj S. Anami & Ramesh S. Wadawadagi & Veerappa B. Pagi, 2014. "Machine Learning Techniques in Web Content Mining: A Comparative Analysis," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 13(01), pages 1-12.
    25. Khalid Ahmed Almutawah, 2014. "A decision support system for academic advisors," International Journal of Business Information Systems, Inderscience Enterprises Ltd, vol. 16(2), pages 177-195.
    26. Tohalino, Jorge V. & Amancio, Diego R., 2018. "Extractive multi-document summarization using multilayer networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 526-539.
    27. Rujuan Wang & Gang Wang, 2019. "Web Text Categorization Based on Statistical Merging Algorithm in Big Data Environment," International Journal of Ambient Computing and Intelligence (IJACI), IGI Global, vol. 10(3), pages 17-32, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Corrêa, Edilson A. & Marinho, Vanessa Q. & Amancio, Diego R., 2020. "Semantic flow in language networks discriminates texts by genre and publication date," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 557(C).
    2. Corrêa, Edilson A. & Amancio, Diego R., 2019. "Word sense induction using word embeddings and community detection in complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 523(C), pages 180-190.
    3. Xiaofang Wo & Guichen Li & Yuantian Sun & Jinghua Li & Sen Yang & Haoran Hao, 2022. "The Changing Tendency and Association Analysis of Intelligent Coal Mines in China: A Policy Text Mining Study," Sustainability, MDPI, vol. 14(18), pages 1-14, September.
    4. Corrêa Jr., Edilson A. & Silva, Filipi N. & da F. Costa, Luciano & Amancio, Diego R., 2017. "Patterns of authors contribution in scientific manuscripts," Journal of Informetrics, Elsevier, vol. 11(2), pages 498-510.
    5. Shakibian, Hadi & Charkari, Nasrollah Moghadam, 2018. "Statistical similarity measures for link prediction in heterogeneous complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 501(C), pages 248-263.
    6. Adilson Vital & Diego R. Amancio, 2022. "A comparative analysis of local similarity metrics and machine learning approaches: application to link prediction in author citation networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(10), pages 6011-6028, October.
    7. de Arruda, Henrique F. & Silva, Filipi N. & Comin, Cesar H. & Amancio, Diego R. & Costa, Luciano da F., 2019. "Connecting network science and information theory," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 515(C), pages 641-648.
    8. Jorge A. V. Tohalino & Laura V. C. Quispe & Diego R. Amancio, 2021. "Analyzing the relationship between text features and grants productivity," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4255-4275, May.
    9. Akimushkin, Camilo & Amancio, Diego R. & Oliveira, Osvaldo N., 2018. "On the role of words in the network structure of texts: Application to authorship attribution," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 495(C), pages 49-58.
    10. Liu, Yanyan & Li, Keping & Yan, Dongyang & Gu, Shuang, 2022. "A network-based CNN model to identify the hidden information in text data," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 590(C).
    11. Ferraz de Arruda, Henrique & Reia, Sandro Martinelli & Silva, Filipi Nascimento & Amancio, Diego Raphael & da Fontoura Costa, Luciano, 2022. "Finding contrasting patterns in rhythmic properties between prose and poetry," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 598(C).
    12. de Arruda, Henrique F. & Marinho, Vanessa Q. & Lima, Thales S. & Amancio, Diego R. & Costa, Luciano da F., 2018. "An image analysis approach to text analytics based on complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 510(C), pages 110-120.
    13. Xiomara S. Q. Chacon & Thiago C. Silva & Diego R. Amancio, 2020. "Comparing the impact of subfields in scientific journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 625-639, October.
    14. Woon Peng Goh & Kang-Kwong Luke & Siew Ann Cheong, 2018. "Functional shortcuts in language co-occurrence networks," PLOS ONE, Public Library of Science, vol. 13(9), pages 1-18, September.
    15. Tohalino, Jorge A.V. & Amancio, Diego R., 2022. "On predicting research grants productivity via machine learning," Journal of Informetrics, Elsevier, vol. 16(2).
    16. Ana C. M. Brito & Filipi N. Silva & Diego R. Amancio, 2023. "Analyzing the influence of prolific collaborations on authors productivity and visibility," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(4), pages 2471-2487, April.
    17. Henrique F. Arruda & Cesar H. Comin & Luciano da F. Costa, 2018. "How integrated are theoretical and applied physics?," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1113-1121, August.
    18. Bernhardt, Lea & Dewenter, Ralf & Thomas, Tobias, 2020. "Measuring partisan media bias in US Newscasts from 2001-2012," Working Paper 183/2020, Helmut Schmidt University, Hamburg, revised 15 Nov 2022.
    19. Ntentas, Raphael, 2021. "Quantifying political populism and examining the link with economic insecurity: evidence from Greece," LSE Research Online Documents on Economics 112579, London School of Economics and Political Science, LSE Library.
    20. Lin, Annie E. & Young, Jimmy A. & Guarino, Jeannine E., 2022. "Mother-Daughter sexual abuse: An exploratory study of the experiences of survivors of MDSA using Reddit," Children and Youth Services Review, Elsevier, vol. 138(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:126:y:2021:i:2:d:10.1007_s11192-020-03785-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.