IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v125y2020i3d10.1007_s11192-020-03455-z.html
   My bibliography  Save this article

Cited text span identification for scientific summarisation using pre-trained encoders

Author

Listed:
  • Chrysoula Zerva

    (University of Manchester)

  • Minh-Quoc Nghiem

    (University of Manchester)

  • Nhung T. H. Nguyen

    (University of Manchester)

  • Sophia Ananiadou

    (University of Manchester
    Alan Turing Institute)

Abstract

We present our approach for the identification of cited text spans in scientific literature, using pre-trained encoders (BERT) in combination with different neural networks. We further experiment to assess the impact of using these cited text spans as input in BERT-based extractive summarisation methods. Inspired and motivated by the CL-SciSumm shared tasks, we explore different methods to adapt pre-trained models which are tuned for generic domain to scientific literature. For the identification of cited text spans, we assess the impact of different configurations in terms of learning from augmented data and using different features and network architectures (BERT, XLNET, CNN, and BiMPM) for training. We show that identifying and fine-tuning the language models on unlabelled or augmented domain specific data can improve the performance of cited text span identification models. For the scientific summarisation we implement an extractive summarisation model adapted from BERT. With respect to the input sentences taken from the cited paper, we explore two different scenarios: (1) consider all the sentences (full-text) of the referenced article as input and (2) consider only the text spans that have been identified to be cited by other publications. We observe that in certain experiments, by using only the cited text-spans we can achieve better performance, while minimising the input size needed.

Suggested Citation

  • Chrysoula Zerva & Minh-Quoc Nghiem & Nhung T. H. Nguyen & Sophia Ananiadou, 2020. "Cited text span identification for scientific summarisation using pre-trained encoders," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 3109-3137, December.
  • Handle: RePEc:spr:scient:v:125:y:2020:i:3:d:10.1007_s11192-020-03455-z
    DOI: 10.1007/s11192-020-03455-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-020-03455-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-020-03455-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Saeed-Ul Hassan & Mubashir Imran & Sehrish Iqbal & Naif Radi Aljohani & Raheel Nawaz, 2018. "Deep context of citations using machine-learning models in scholarly full-text articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1645-1662, December.
    2. B Ian Hutchins & Xin Yuan & James M Anderson & George M Santangelo, 2016. "Relative Citation Ratio (RCR): A New Metric That Uses Citation Rates to Measure Influence at the Article Level," PLOS Biology, Public Library of Science, vol. 14(9), pages 1-25, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Moreno La Quatra & Luca Cagliero & Elena Baralis, 2021. "Leveraging full-text article exploration for citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(10), pages 8275-8293, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dunaiski, Marcel & Geldenhuys, Jaco & Visser, Willem, 2019. "On the interplay between normalisation, bias, and performance of paper impact metrics," Journal of Informetrics, Elsevier, vol. 13(1), pages 270-290.
    2. A Cecile J W Janssens & Michael Goodman & Kimberly R Powell & Marta Gwinn, 2017. "A critical evaluation of the algorithm behind the Relative Citation Ratio (RCR)," PLOS Biology, Public Library of Science, vol. 15(10), pages 1-5, October.
    3. Adrian G Barnett & Pauline Zardo & Nicholas Graves, 2018. "Randomly auditing research labs could be an affordable way to improve research quality: A simulation study," PLOS ONE, Public Library of Science, vol. 13(4), pages 1-17, April.
    4. Bowen Song & Chunjuan Luan & Danni Liang, 2023. "Identification of emerging technology topics (ETTs) using BERT-based model and sematic analysis: a perspective of multiple-field characteristics of patented inventions (MFCOPIs)," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(11), pages 5883-5904, November.
    5. Mohammed S. Alqahtani & Mohamed Abbas & Mohammed Abdul Muqeet & Hussain M. Almohiy, 2022. "Research Productivity in Terms of Output, Impact, and Collaboration for University Researchers in Saudi Arabia: SciVal Analytics and t -Tests Statistical Based Approach," Sustainability, MDPI, vol. 14(23), pages 1-21, December.
    6. Thelwall, Mike, 2018. "Dimensions: A competitor to Scopus and the Web of Science?," Journal of Informetrics, Elsevier, vol. 12(2), pages 430-435.
    7. Ruihua Qi & Jia Wei & Zhen Shao & Zhengguang Li & Heng Chen & Yunhao Sun & Shaohua Li, 2023. "Multi-task learning model for citation intent classification in scientific publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(12), pages 6335-6355, December.
    8. Li, Heyang & Wu, Meijun & Wang, Yougui & Zeng, An, 2022. "Bibliographic coupling networks reveal the advantage of diversification in scientific projects," Journal of Informetrics, Elsevier, vol. 16(3).
    9. Yang, Alex Jie & Wu, Linwei & Zhang, Qi & Wang, Hao & Deng, Sanhong, 2023. "The k-step h-index in citation networks at the paper, author, and institution levels," Journal of Informetrics, Elsevier, vol. 17(4).
    10. Naif Radi Aljohani & Ayman Fayoumi & Saeed-Ul Hassan, 2021. "An in-text citation classification predictive model for a scholarly search system," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5509-5529, July.
    11. Corrêa Jr., Edilson A. & Silva, Filipi N. & da F. Costa, Luciano & Amancio, Diego R., 2017. "Patterns of authors contribution in scientific manuscripts," Journal of Informetrics, Elsevier, vol. 11(2), pages 498-510.
    12. Mahira Ahmad & Amina Muazzam & Ambreen Anjum & Anna Visvizi & Raheel Nawaz, 2020. "Linking Work-Family Conflict (WFC) and Talent Management: Insights from a Developing Country," Sustainability, MDPI, vol. 12(7), pages 1-17, April.
    13. Torres-Salinas, Daniel & Valderrama-Baca, Pilar & Arroyo-Machado, Wenceslao, 2022. "Is there a need for a new journal metric? Correlations between JCR Impact Factor metrics and the Journal Citation Indicator—JCI," Journal of Informetrics, Elsevier, vol. 16(3).
    14. repec:plo:pone00:0195321 is not listed on IDEAS
    15. Joseph Staudt & Huifeng Yu & Robert P Light & Gerald Marschke & Katy Börner & Bruce A Weinberg, 2018. "High-impact and transformative science (HITS) metrics: Definition, exemplification, and comparison," PLOS ONE, Public Library of Science, vol. 13(7), pages 1-23, July.
    16. Chun-Kai Huang & Cameron Neylon & Lucy Montgomery & Richard Hosking & James P. Diprose & Rebecca N. Handcock & Katie Wilson, 2024. "Open access research outputs receive more diverse citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(2), pages 825-845, February.
    17. Yuan Zhou & Fang Dong & Yufei Liu & Liang Ran, 2021. "A deep learning framework to early identify emerging technologies in large-scale outlier patents: an empirical study of CNC machine tool," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 969-994, February.
    18. Li, Xin & Tang, Xuli & Lu, Wei, 2024. "Investigating clinical links in edge-labeled citation networks of biomedical research: A translational science perspective," Journal of Informetrics, Elsevier, vol. 18(3).
    19. Heng Huang & Donghua Zhu & Xuefeng Wang, 2022. "Evaluating scientific impact of publications: combining citation polarity and purpose," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5257-5281, September.
    20. Lutz Bornmann & Alexander Tekles & Loet Leydesdorff, 2019. "How well does I3 perform for impact measurement compared to other bibliometric indicators? The convergent validity of several (field-normalized) indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(2), pages 1187-1205, May.
    21. Jay Bhattacharya & Mikko Packalen, 2020. "Stagnation and Scientific Incentives," NBER Working Papers 26752, National Bureau of Economic Research, Inc.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:125:y:2020:i:3:d:10.1007_s11192-020-03455-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.