IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v14y2020i2s1751157719301051.html
   My bibliography  Save this article

Effect of class imbalance in heterogeneous network embedding: An empirical study

Author

Listed:
  • Anil, Akash
  • Singh, Sanasam Ranbir

Abstract

Network science has been extensively explored in solving various bibliometrics tasks such as Co-authorship prediction, Author classification, Author clustering, Author ranking, Paper ranking, etc. While majority of the past studies exploit homogeneous bibliographic network (consists of singular type of nodes and edges), in recent past there is a surge in using heterogeneous bibliographic entities and their inter-dependencies using heterogeneous information networks (HIN). Unlike homogeneous bibliographic networks, a bibliographic HIN consists of multi-typed nodes such as Author, Paper, Venue, etc. and corresponding relations. Thus bibliographic HIN is more complex and captures rich semantics of underlying bibliographic data as well as poses more challenges. Since a real-world HIN may have different number of instances for different node types, class imbalance is ubiquitous. Recent studies discuss class imbalance in brief and exploit meta-path-based strategies to address the issue. However, there is no work which quantitatively study the effect of class imbalance in regards to solving real-world bibliometrics tasks. Therefore, this paper first proposes a metric to estimate class imbalance in HIN and study the effects of class imbalance over two bibliometrics tasks, namely (i) Co-authorship prediction and (ii) Author's research area classification, using node features generated by network embedding-based frameworks for DBLP dataset. From various experimental analysis, it is evident that class imbalance in bibliographic HIN is an inherent characteristic and for better performance of the above-mentioned bibliometrics tasks, the bibliographic HINs must consider Author, Paper, and Venue as node types.

Suggested Citation

  • Anil, Akash & Singh, Sanasam Ranbir, 2020. "Effect of class imbalance in heterogeneous network embedding: An empirical study," Journal of Informetrics, Elsevier, vol. 14(2).
  • Handle: RePEc:eee:infome:v:14:y:2020:i:2:s1751157719301051
    DOI: 10.1016/j.joi.2020.101009
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157719301051
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2020.101009?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lungeanu, Alina & Huang, Yun & Contractor, Noshir S., 2014. "Understanding the assembly of interdisciplinary teams and its impact on performance," Journal of Informetrics, Elsevier, vol. 8(1), pages 59-70.
    2. Dorte Henriksen, 2016. "The rise in co-authorship in the social sciences (1980–2013)," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(2), pages 455-476, May.
    3. Chen, Shiji & Arsenault, Clément & Larivière, Vincent, 2015. "Are top-cited papers more interdisciplinary?," Journal of Informetrics, Elsevier, vol. 9(4), pages 1034-1046.
    4. Yang, Jiansheng & Vannier, Michael W. & Wang, Fang & Deng, Yan & Ou, Fengrong & Bennett, James & Liu, Yang & Wang, Ge, 2013. "A bibliometric analysis of academic publication and NIH funding," Journal of Informetrics, Elsevier, vol. 7(2), pages 318-324.
    5. Dondio, Pierpaolo & Casnici, Niccolò & Grimaldo, Francisco & Gilbert, Nigel & Squazzoni, Flaminio, 2019. "The “invisible hand” of peer review: The implications of author-referee networks on peer review in a scholarly journal," Journal of Informetrics, Elsevier, vol. 13(2), pages 708-716.
    6. Bettencourt, Luís M.A. & Kaiser, David I. & Kaur, Jasleen, 2009. "Scientific discovery and topological transitions in collaboration networks," Journal of Informetrics, Elsevier, vol. 3(3), pages 210-221.
    7. Rodriguez, Marko A. & Pepe, Alberto, 2008. "On the relationship between the structural and socioacademic communities of a coauthorship network," Journal of Informetrics, Elsevier, vol. 2(3), pages 195-201.
    8. Zuo, Zhiya & Zhao, Kang, 2018. "The more multidisciplinary the better? – The prevalence and interdisciplinarity of research collaborations in multidisciplinary institutions," Journal of Informetrics, Elsevier, vol. 12(3), pages 736-756.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lee, O-Joun & Jeon, Hyeon-Ju & Jung, Jason J., 2021. "Learning multi-resolution representations of research patterns in bibliographic networks," Journal of Informetrics, Elsevier, vol. 15(1).
    2. Wang, Ruby W. & Wei, Shelia X. & Ye, Fred Y., 2021. "Extracting a core structure from heterogeneous information network using h-subnet and meta-path strength," Journal of Informetrics, Elsevier, vol. 15(3).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. John McLevey & Alexander V. Graham & Reid McIlroy-Young & Pierson Browne & Kathryn S. Plaisance, 2018. "Interdisciplinarity and insularity in the diffusion of knowledge: an analysis of disciplinary boundaries between philosophy of science and the sciences," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 331-349, October.
    2. Yu, Xiaoyao & Szymanski, Boleslaw K. & Jia, Tao, 2021. "Become a better you: Correlation between the change of research direction and the change of scientific performance," Journal of Informetrics, Elsevier, vol. 15(3).
    3. Luka Ursić & Godfrey Baldacchino & Željana Bašić & Ana Belén Sainz & Ivan Buljan & Miriam Hampel & Ivana Kružić & Mia Majić & Ana Marušić & Franck Thetiot & Ružica Tokalić & Leandra Vranješ Markić, 2022. "Factors Influencing Interdisciplinary Research and Industry-Academia Collaborations at Six European Universities: A Qualitative Study," Sustainability, MDPI, vol. 14(15), pages 1-24, July.
    4. Zhao, Star X. & Rousseau, Ronald & Ye, Fred Y., 2011. "h-Degree as a basic measure in weighted networks," Journal of Informetrics, Elsevier, vol. 5(4), pages 668-677.
    5. Shufang Huang & Jin Chen & Liang Mei & Weiqiao Mo, 2019. "The Effect of Heterogeneity and Leadership on Innovation Performance: Evidence from University Research Teams in China," Sustainability, MDPI, vol. 11(16), pages 1-14, August.
    6. Citron, Daniel T. & Way, Samuel F., 2018. "Network assembly of scientific communities of varying size and specificity," Journal of Informetrics, Elsevier, vol. 12(1), pages 181-190.
    7. Gregorio González-Alcaide, 2021. "Bibliometric studies outside the information science and library science field: uncontainable or uncontrollable?," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6837-6870, August.
    8. Meijun Liu & Sijie Yang & Yi Bu & Ning Zhang, 2023. "Female early-career scientists have conducted less interdisciplinary research in the past six decades: evidence from doctoral theses," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-16, December.
    9. Xian Li & Ronald Rousseau & Liming Liang & Fangjie Xi & Yushuang Lü & Yifan Yuan & Xiaojun Hu, 2022. "Is low interdisciplinarity of references an unexpected characteristic of Nobel Prize winning research?," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 2105-2122, April.
    10. Sahatqija, Kosovare & Kadriu, Arbana, 2019. "Exploring Gender Role in Co-Authorship Networks for Computing Books: A Case Study in DBLP," Proceedings of the ENTRENOVA - ENTerprise REsearch InNOVAtion Conference (2019), Rovinj, Croatia, in: Proceedings of the ENTRENOVA - ENTerprise REsearch InNOVAtion Conference, Rovinj, Croatia, 12-14 September 2019, pages 33-39, IRENET - Society for Advancing Innovation and Research in Economy, Zagreb.
    11. Yunwei Chen & Katy Börner & Shu Fang, 2013. "Evolving collaboration networks in Scientometrics in 1978–2010: a micro–macro analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(3), pages 1051-1070, June.
    12. Schlecht, Colleen & McGuier, Elizabeth A. & Ann Huang, Lee & Daro, Deborah, 2023. "Creating an interdisciplinary collaborative network of scholars in child maltreatment prevention: A network analysis of the Doris Duke Fellowships for the Promotion of Child Well-Being," Children and Youth Services Review, Elsevier, vol. 153(C).
    13. Arnauld Bessagnet & Joan Crespo & Jerome Vicente, 2023. "How is the literature on Digital Entrepreneurial Ecosystems structured? A socio-semantic network approach," Papers in Evolutionary Economic Geography (PEEG) 2320, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Oct 2023.
    14. Zhentao Liang & Jin Mao & Gang Li, 2023. "Bias against scientific novelty: A prepublication perspective," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(1), pages 99-114, January.
    15. Wynne E. Norton & Alina Lungeanu & David A. Chambers & Noshir Contractor, 2017. "Mapping the growing discipline of dissemination and implementation science in health," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1367-1390, September.
    16. Lambiotte, R. & Panzarasa, P., 2009. "Communities, knowledge creation, and information diffusion," Journal of Informetrics, Elsevier, vol. 3(3), pages 180-190.
    17. Shiji Chen & Yanhui Song & Fei Shu & Vincent Larivière, 2022. "Interdisciplinarity and impact: the effects of the citation time window," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2621-2642, May.
    18. Jordi Ardanuy & Llorenç Arguimbau & Ángel Borrego, 2022. "Social sciences and humanities research funded under the European Union Sixth Framework Programme (2002–2006): a long-term assessment of projects, acknowledgements and publications," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-13, December.
    19. Carlos B. Amat & François Perruchas, 2016. "Evolving cohesion metrics of a research network on rare diseases: a longitudinal study over 14 years," Scientometrics, Springer;Akadémiai Kiadó, vol. 108(1), pages 41-56, July.
    20. Bulent Ozel, 2012. "Collaboration structure and knowledge diffusion in Turkish management academia," Scientometrics, Springer;Akadémiai Kiadó, vol. 93(1), pages 183-206, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:14:y:2020:i:2:s1751157719301051. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.