IDEAS home Printed from https://ideas.repec.org/a/eee/phsmap/v391y2012i4p1855-1864.html
   My bibliography  Save this article

Extractive summarization using complex networks and syntactic dependency

Author

Listed:
  • Amancio, Diego R.
  • Nunes, Maria G.V.
  • Oliveira, Osvaldo N.
  • Costa, Luciano da F.

Abstract

The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore there is ample opportunity to enhance the statistics-based methods as new measures of network topology and dynamics are created. In this paper, we employ for the first time the metrics betweenness, vulnerability and diversity to analyze written texts in Brazilian Portuguese. Using strategies based on diversity metrics, a better performance in automatic summarization is achieved in comparison to previous work employing complex networks. With an optimized method the Rouge score (an automatic evaluation method used in summarization) was 0.5089, which is the best value ever achieved for an extractive summarizer with statistical methods based on complex networks for Brazilian Portuguese. Furthermore, the diversity metric can detect keywords with high precision, which is why we believe it is suitable to produce good summaries. It is also shown that incorporating linguistic knowledge through a syntactic parser does enhance the performance of the automatic summarizers, as expected, but the increase in the Rouge score is only minor. These results reinforce the suitability of complex network methods for improving automatic summarizers in particular, and treating text in general.

Suggested Citation

  • Amancio, Diego R. & Nunes, Maria G.V. & Oliveira, Osvaldo N. & Costa, Luciano da F., 2012. "Extractive summarization using complex networks and syntactic dependency," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(4), pages 1855-1864.
  • Handle: RePEc:eee:phsmap:v:391:y:2012:i:4:p:1855-1864
    DOI: 10.1016/j.physa.2011.10.015
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0378437111007953
    Download Restriction: Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

    File URL: https://libkey.io/10.1016/j.physa.2011.10.015?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Barabási, Albert-László & Albert, Réka & Jeong, Hawoong, 2000. "Scale-free characteristics of random networks: the topology of the world-wide web," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 281(1), pages 69-77.
    2. H. Bauke, 2007. "Parameter estimation for power-law distributions by maximum likelihood methods," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 58(2), pages 167-173, July.
    3. Viana, Matheus Palhares & Travençolo, Bruno A.N. & Tanck, Esther & Costa, Luciano da Fontoura, 2010. "Characterizing topological and dynamical properties of complex networks without border effects," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(8), pages 1771-1778.
    4. Amancio, D.R. & Nunes, M.G.V. & Oliveira, O.N. & Pardo, T.A.S. & Antiqueira, L. & da F. Costa, L., 2011. "Using metrics from complex networks to evaluate machine translation," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(1), pages 131-142.
    5. Antiqueira, L. & Nunes, M.G.V. & Oliveira Jr., O.N. & F. Costa, L. da, 2007. "Strong correlations between text quality and complex networks features," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 373(C), pages 811-820.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Diego Raphael Amancio, 2015. "Comparing the topological properties of real and artificially generated scientific manuscripts," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1763-1779, December.
    2. Martinčić-Ipšić, Sanda & Margan, Domagoj & Meštrović, Ana, 2016. "Multilayer network of language: A unified framework for structural analysis of linguistic subsystems," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 457(C), pages 117-128.
    3. Silva, Filipi N. & Amancio, Diego R. & Bardosova, Maria & Costa, Luciano da F. & Oliveira, Osvaldo N., 2016. "Using network science and text analytics to produce surveys in a scientific topic," Journal of Informetrics, Elsevier, vol. 10(2), pages 487-502.
    4. Amancio, Diego R. & Oliveira Jr., Osvaldo N. & Costa, Luciano da F., 2012. "Structure–semantics interplay in complex networks and its effects on the predictability of similarity in texts," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(18), pages 4406-4419.
    5. Dejian Yu & Wanru Wang & Shuai Zhang & Wenyu Zhang & Rongyu Liu, 2017. "Hybrid self-optimized clustering model based on citation links and textual features to detect research topics," PLOS ONE, Public Library of Science, vol. 12(10), pages 1-21, October.
    6. Tohalino, Jorge V. & Amancio, Diego R., 2018. "Extractive multi-document summarization using multilayer networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 526-539.
    7. Diego R Amancio, 2015. "Probing the Topological Properties of Complex Networks Modeling Short Written Texts," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-17, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Yanhui & Bi, Lifeng & Lin, Shuai & Li, Man & Shi, Hao, 2017. "A complex network-based importance measure for mechatronics systems," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 466(C), pages 180-198.
    2. Liu, Yanyan & Li, Keping & Yan, Dongyang & Gu, Shuang, 2022. "A network-based CNN model to identify the hidden information in text data," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 590(C).
    3. D. R. Amancio & M. G. V. Nunes & O. N. Oliveira & L. F. Costa, 2012. "Using complex networks concepts to assess approaches for citations in scientific papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(3), pages 827-842, June.
    4. Amancio, Diego R. & Oliveira Jr., Osvaldo N. & Costa, Luciano da F., 2012. "Structure–semantics interplay in complex networks and its effects on the predictability of similarity in texts," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(18), pages 4406-4419.
    5. Ruiz Vargas, E. & Mitchell, D.G.V. & Greening, S.G. & Wahl, L.M., 2014. "Topology of whole-brain functional MRI networks: Improving the truncated scale-free model," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 405(C), pages 151-158.
    6. Giacomello, Giampiero & Picci, Lucio, 2003. "My scale or your meter? Evaluating methods of measuring the Internet," Information Economics and Policy, Elsevier, vol. 15(3), pages 363-383, September.
    7. Kwame Boamah‐Addo & Tomasz J. Kozubowski & Anna K. Panorska, 2023. "A discrete truncated Zipf distribution," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 77(2), pages 156-187, May.
    8. Ormerod, Paul & Roach, Andrew P, 2004. "The Medieval inquisition: scale-free networks and the suppression of heresy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 339(3), pages 645-652.
    9. Castagna, Alina & Chentouf, Leila & Ernst, Ekkehard, 2017. "Economic vulnerabilities in Italy: A network analysis using similarities in sectoral employment," GLO Discussion Paper Series 50, Global Labor Organization (GLO).
    10. Pascal Billand & Christophe Bravard & Sudipta Sarangi, 2011. "Resources Flows Asymmetries in Strict Nash Networks with Partner Heterogeneity," Working Papers 1108, Groupe d'Analyse et de Théorie Economique Lyon St-Étienne (GATE Lyon St-Étienne), Université de Lyon.
    11. Wei, Daijun & Deng, Xinyang & Zhang, Xiaoge & Deng, Yong & Mahadevan, Sankaran, 2013. "Identifying influential nodes in weighted networks based on evidence theory," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 392(10), pages 2564-2575.
    12. Jiang, Jingchi & Zheng, Jichuan & Zhao, Chao & Su, Jia & Guan, Yi & Yu, Qiubin, 2016. "Clinical-decision support based on medical literature: A complex network approach," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 459(C), pages 42-54.
    13. Zio, Enrico, 2016. "Challenges in the vulnerability and risk analysis of critical infrastructures," Reliability Engineering and System Safety, Elsevier, vol. 152(C), pages 137-150.
    14. Tamás Sebestyén & Dóra Longauer, 2018. "Network structure, equilibrium and dynamics in a monopolistically competitive economy," Netnomics, Springer, vol. 19(3), pages 131-157, December.
    15. Wang, Huan & Xu, Chuan-Yun & Hu, Jing-Bo & Cao, Ke-Fei, 2014. "A complex network analysis of hypertension-related genes," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 394(C), pages 166-176.
    16. Elisa Letizia & Fabrizio Lillo, 2017. "Corporate payments networks and credit risk rating," Papers 1711.07677, arXiv.org, revised Sep 2018.
    17. Dunia López-Pintado, 2006. "Contagion and coordination in random networks," International Journal of Game Theory, Springer;Game Theory Society, vol. 34(3), pages 371-381, October.
    18. Li, Jianyu & Zhou, Jie, 2007. "Chinese character structure analysis based on complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 380(C), pages 629-638.
    19. Laureti, Paolo & Moret, Lionel & Zhang, Yi-Cheng, 2005. "Aggregating partial, local evaluations to achieve global ranking," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 345(3), pages 705-712.
    20. Jean-Marc Luck & Anita Mehta, 2023. "Evolution of grammatical forms: some quantitative approaches," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 96(2), pages 1-15, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:phsmap:v:391:y:2012:i:4:p:1855-1864. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/physica-a-statistical-mechpplications/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.