IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v128y2023i3d10.1007_s11192-022-04627-9.html
   My bibliography  Save this article

An artificial intelligence-based framework for data-driven categorization of computer scientists: a case study of world’s Top 10 computing departments

Author

Listed:
  • Nisar Ali

    (Ghulam Ishaq Khan Institute of Engineering Sciences and Technology
    University of Regina)

  • Zahid Halim

    (Ghulam Ishaq Khan Institute of Engineering Sciences and Technology)

  • Syed Fawad Hussain

    (University of Birmingham)

Abstract

The total number of published articles and the resulting citations are generally acknowledged as suitable criteria of the scientist’s evaluation. However, it is challenging to determine the ranking of scientists as the value of their scientific work (at times) is not directly reflective of the abovementioned aspects. In this regard, multiple other elements needs to be examined in combination for better evaluating the scientific worth of an individual. This work presents a learning-based technique, i.e., an Artificial Intelligence (AI)-based solution towards categorizing scientists utilizing a multifaceted criteria. In this context, a novel ranking metric is proposed which is grounded on authorship, experience, publications count, total citations, i10-index, and h-index. To assess the proposed framework’s performance, a dataset is collected considering the world’s top ten computing departments and ten domestic ones. This results in a data of 1000 computer scientists. The dataset is preprocessed and afterwards three techniques for feature selection are employed, i.e., Mutual Information (MI), Chi-Square (X2), and Fisher-Test (F-Test) to rank the features in the data. To validate the collected data, the framework has three clustering techniques as well, namely, k-medoids, k-means, and spectral clustering to identify the optimum number of heterogeneous groups. Three cluster validity indices are used to evaluate the clustering outcomes, namely, Calinski-Harabasz Index (CHI), Davies Bouldin Index (DBI), and Silhouette Coefficient (SC). Once the optimum clusters are obtained, five classification procedures are used, including, Artificial Neural Network (ANN), k-Nearest Neighbor (k-NN), Decision Tree (DT), Gaussian Naive Bayes (GNB), and Linear Regression Classifier (LRC) to predict the category of a previously unknown scientist. Among all classifiers, an average accuracy of 94.44% is shown by the ANN to predict an unknown/new scientist category. The current proposal is also compared with closely related past works. The proposed framework offers the possibility to independently classify scientists based on AI techniques.

Suggested Citation

  • Nisar Ali & Zahid Halim & Syed Fawad Hussain, 2023. "An artificial intelligence-based framework for data-driven categorization of computer scientists: a case study of world’s Top 10 computing departments," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(3), pages 1513-1545, March.
  • Handle: RePEc:spr:scient:v:128:y:2023:i:3:d:10.1007_s11192-022-04627-9
    DOI: 10.1007/s11192-022-04627-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-022-04627-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-022-04627-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bouyssou, Denis & Marchant, Thierry, 2016. "Ranking authors using fractional counting of citations: An axiomatic approach," Journal of Informetrics, Elsevier, vol. 10(1), pages 183-199.
    2. Upul Senanayake & Mahendra Piraveenan & Albert Zomaya, 2015. "The Pagerank-Index: Going beyond Citation Counts in Quantifying Scientific Impact of Researchers," PLOS ONE, Public Library of Science, vol. 10(8), pages 1-34, August.
    3. Ying Ding, 2011. "Applying weighted PageRank to author citation networks," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(2), pages 236-245, February.
    4. John S. Liu & Louis Y.Y. Lu, 2012. "An integrated approach for main path analysis: Development of the Hirsch index as an example," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(3), pages 528-542, March.
    5. Christoph Bartneck & Servaas Kokkelmans, 2011. "Detecting h-index manipulation through self-citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 87(1), pages 85-98, April.
    6. Zahid Halim & Shafaq Khan, 2019. "A data science-based framework to categorize academic journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 393-423, April.
    7. Chao Gao & Zhen Wang & Xianghua Li & Zili Zhang & Wei Zeng, 2016. "PR-Index: Using the h-Index and PageRank for Determining True Impact," PLOS ONE, Public Library of Science, vol. 11(9), pages 1-13, September.
    8. Ying Ding & Guo Zhang & Tamy Chambers & Min Song & Xiaolong Wang & Chengxiang Zhai, 2014. "Content-based citation analysis: The next generation of citation analysis," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(9), pages 1820-1833, September.
    9. John S. Liu & Louis Y.Y. Lu, 2012. "An integrated approach for main path analysis: Development of the Hirsch index as an example," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(3), pages 528-542, March.
    10. Sven E. Hug & Michael Ochsner & Martin P. Brändle, 2017. "Citation analysis with microsoft academic," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(1), pages 371-378, April.
    11. Ying Ding, 2011. "Applying weighted PageRank to author citation networks," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(2), pages 236-245, February.
    12. Leo Egghe, 2006. "Theory and practise of the g-index," Scientometrics, Springer;Akadémiai Kiadó, vol. 69(1), pages 131-152, October.
    13. Jiang, Xiaorui & Zhuge, Hai, 2019. "Forward search path count as an alternative indirect citation impact indicator," Journal of Informetrics, Elsevier, vol. 13(4).
    14. Lutz Bornmann & Hans‐Dieter Daniel, 2007. "What do we know about the h index?," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 58(9), pages 1381-1385, July.
    15. Mark P. Carpenter & Francis Narin, 1981. "The adequacy of the science citation index (SCI) as an indicator of international scientific activity," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 32(6), pages 430-439, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dinesh Pradhan & Partha Sarathi Paul & Umesh Maheswari & Subrata Nandi & Tanmoy Chakraborty, 2017. "$$C^3$$ C 3 -index: a PageRank based multi-faceted metric for authors’ performance measurement," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(1), pages 253-273, January.
    2. Hao Wang & Hua-Wei Shen & Xue-Qi Cheng, 2016. "Scientific credit diffusion: Researcher level or paper level?," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(2), pages 827-837, November.
    3. Nadia Simoes & Nuno Crespo, 2020. "A flexible approach for measuring author-level publishing performance," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(1), pages 331-355, January.
    4. Yubing Nie & Yifan Zhu & Qika Lin & Sifan Zhang & Pengfei Shi & Zhendong Niu, 2019. "Academic rising star prediction via scholar’s evaluation model and machine learning techniques," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 461-476, August.
    5. Zhang, Fang & Wu, Shengli, 2020. "Predicting future influence of papers, researchers, and venues in a dynamic academic network," Journal of Informetrics, Elsevier, vol. 14(2).
    6. Chao Lu & Ying Ding & Chengzhi Zhang, 2017. "Understanding the impact change of a highly cited article: a content-based citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(2), pages 927-945, August.
    7. Chen, Meiqian & Guo, Zhaoxia & Dong, Yucheng & Chiclana, Francisco & Herrera-Viedma, Enrique, 2021. "Citations optimal growth path: A tool to analyze sensitivity to citations of h-like indexes," Journal of Informetrics, Elsevier, vol. 15(4).
    8. Rok Blagus & Brane L. Leskošek & Janez Stare, 2015. "Comparison of bibliometric measures for assessing relative importance of researchers," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1743-1762, December.
    9. Dejian Yu & Zhaoping Yan, 2022. "Combining machine learning and main path analysis to identify research front: from the perspective of science-technology linkage," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(7), pages 4251-4274, July.
    10. Jiang, Xiaorui & Zhuge, Hai, 2019. "Forward search path count as an alternative indirect citation impact indicator," Journal of Informetrics, Elsevier, vol. 13(4).
    11. Huang, Chen-Hao & Liu, John S. & Ho, Mei Hsiu-Ching & Chou, Tzu-Chuan, 2022. "Towards more convergent main paths: A relevance-based approach," Journal of Informetrics, Elsevier, vol. 16(3).
    12. Yu, Dejian & Yan, Zhaoping, 2023. "Main path analysis considering citation structure and content: Case studies in different domains," Journal of Informetrics, Elsevier, vol. 17(1).
    13. Liu, John S. & Lu, Louis Y.Y. & Ho, Mei Hsiu-Ching, 2012. "Total influence and mainstream measures for scientific researchers," Journal of Informetrics, Elsevier, vol. 6(4), pages 496-504.
    14. Xiaorui Jiang & Junjun Liu, 2023. "Extracting the evolutionary backbone of scientific domains: The semantic main path network analysis approach based on citation context analysis," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(5), pages 546-569, May.
    15. Lai, Kuei-Kuei & Bhatt, Priyanka C. & Kumar, Vimal & Chen, Hsueh-Chen & Chang, Yu-Hsin & Su, Fang-Pei, 2021. "Identifying the impact of patent family on the patent trajectory: A case of thin film solar cells technological trajectories," Journal of Informetrics, Elsevier, vol. 15(2).
    16. Ruijie Wang & Yuhao Zhou & An Zeng, 2023. "Evaluating scientists by citation and disruption of their representative works," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(3), pages 1689-1710, March.
    17. Mei Hsiu-Ching Ho & John S. Liu, 2021. "The swift knowledge development path of COVID-19 research: the first 150 days," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(3), pages 2391-2399, March.
    18. Yan, Jianghui & Tseng, Fang-Mei & Lu, Louis Y.Y., 2018. "Developmental trajectories of new energy vehicle research in economic management: Main path analysis," Technological Forecasting and Social Change, Elsevier, vol. 137(C), pages 168-181.
    19. Vîiu, Gabriel-Alexandru, 2016. "A theoretical evaluation of Hirsch-type bibliometric indicators confronted with extreme self-citation," Journal of Informetrics, Elsevier, vol. 10(2), pages 552-566.
    20. Dejing Kong & Jianzhong Yang & Lingfeng Li, 2020. "Early identification of technological convergence in numerical control machine tool: a deep learning approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 1983-2009, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:128:y:2023:i:3:d:10.1007_s11192-022-04627-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.