IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v15y2021i1s1751157720306295.html
   My bibliography  Save this article

Functional analysis of generalized linear models under non-linear constraints with applications to identifying highly-cited papers

Author

Listed:
  • Chowdhury, K.P.

Abstract

This article introduces a versatile functional form for Generalized Linear Models (GLMs) through a simple, yet effective, transformation of the current framework. The models are applied through a new hierarchical bayesian estimation procedure for logistic regression to highly-cited papers in the Management Information Systems (MIS) field. The results are uniformly better, in regards to model fit and inference for in-sample and out-of-sample data, for simulation studies and real-world data applications, requiring very little time to convergence to true population parameters. In simulation studies, I show that the method contains the true parameters nearly three times as often as widely used existing GLMs, and does so while having confidence intervals that are 54.50% smaller, while requiring around two-thirds the number of MCMC iterations as existing bayesian methods. In Scientometric applications the methodology is shown to be highly robust with predictive/classification accuracy, either equaling or exceeding existing methods for identifying highly-cited articles including Artificial Neural Networks (ANN). Thus, the method is shown to be robust to the amount of asymmetry (or symmetry) of the probability of success (or failure) and robust to unbalanced samples and varying Data Generating Processes. Further, the methodology is equivalent to current methods if the data support them and is therefore complementary to existing methods, without loss of interpretability of model parameters. For the MIS field it finds that Popularity Parameter (PP) of an article Keywords can predict whether a paper will be highly-cited (top 25% of highly-cited articles) between two to three years after publication and beyond. Furthermore, given the small number of iterations needed for convergence, the methodology can also be used as a baseline method in Big Data (BD) settings for both Artificial Intelligence (AI) and Machine Learning (ML) contexts as well.

Suggested Citation

  • Chowdhury, K.P., 2021. "Functional analysis of generalized linear models under non-linear constraints with applications to identifying highly-cited papers," Journal of Informetrics, Elsevier, vol. 15(1).
  • Handle: RePEc:eee:infome:v:15:y:2021:i:1:s1751157720306295
    DOI: 10.1016/j.joi.2020.101112
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157720306295
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2020.101112?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bornmann, Lutz & Leydesdorff, Loet & Wang, Jian, 2014. "How to improve the prediction based on citation impact percentiles for years shortly after the publication date?," Journal of Informetrics, Elsevier, vol. 8(1), pages 175-180.
    2. Babak Sohrabi & Hamideh Iraj, 2017. "The effect of keyword repetition in abstract and keyword frequency per journal in predicting citation counts," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(1), pages 243-251, January.
    3. Zhang, Yi & Lu, Jie & Liu, Feng & Liu, Qian & Porter, Alan & Chen, Hongshu & Zhang, Guangquan, 2018. "Does deep learning help topic extraction? A kernel k-means clustering method with word embedding," Journal of Informetrics, Elsevier, vol. 12(4), pages 1099-1117.
    4. Abrishami, Ali & Aliakbary, Sadegh, 2019. "Predicting citation counts based on deep neural network learning techniques," Journal of Informetrics, Elsevier, vol. 13(2), pages 485-499.
    5. Murad H. & Fleischman A. & Sadetzki S. & Geyer O. & Freedman L.S., 2003. "Small Samples and Ordered Logistic Regression: Does it Help to Collapse Categories of Outcome?," The American Statistician, American Statistical Association, vol. 57, pages 155-160, August.
    6. Uddin, Shahadat & Khan, Arif, 2016. "The impact of author-selected keywords on citation counts," Journal of Informetrics, Elsevier, vol. 10(4), pages 1166-1177.
    7. Hallsworth, Michael & List, John A. & Metcalfe, Robert D. & Vlaev, Ivo, 2017. "The behavioralist as tax collector: Using natural field experiments to enhance tax compliance," Journal of Public Economics, Elsevier, vol. 148(C), pages 14-31.
    8. Debabrata Talukdar, 2008. "Cost of Being Poor: Retail Price and Consumer Price Search Differences across Inner-City and Suburban Neighborhoods," Journal of Consumer Research, Journal of Consumer Research Inc., vol. 35(3), pages 457-471, July.
    9. Tsai, Chih-Fong, 2014. "Citation impact analysis of top ranked computer science journals and their rankings," Journal of Informetrics, Elsevier, vol. 8(2), pages 318-328.
    10. Benjamin Edelman & Michael Luca & Dan Svirsky, 2017. "Racial Discrimination in the Sharing Economy: Evidence from a Field Experiment," American Economic Journal: Applied Economics, American Economic Association, vol. 9(2), pages 1-22, April.
    11. Bai, Xiaomei & Zhang, Fuli & Lee, Ivan, 2019. "Predicting the citations of scholarly paper," Journal of Informetrics, Elsevier, vol. 13(1), pages 407-418.
    12. Hu, Ya-Han & Tai, Chun-Tien & Liu, Kang Ernest & Cai, Cheng-Fang, 2020. "Identification of highly-cited papers using topic-model-based and bibliometric features: the consideration of keyword popularity," Journal of Informetrics, Elsevier, vol. 14(1).
    13. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, September.
    14. Mingyang Wang & Zhenyu Wang & Guangsheng Chen, 2019. "Which can better predict the future success of articles? Bibliometric indices or alternative metrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1575-1595, June.
    15. Allenby, Greg M. & Rossi, Peter E., 1998. "Marketing models of consumer heterogeneity," Journal of Econometrics, Elsevier, vol. 89(1-2), pages 57-78, November.
    16. Daryl Pregibon, 1980. "Goodness of Link Tests for Generalized Linear Models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 29(1), pages 15-24, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Grazia Sveva Ascione & Laura Ciucci & Claudio Detotto & Valerio Sterzi, 2022. "Universities involvement in patent litigation: an analysis of the characteristics of US litigated patents," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 6855-6879, December.
    2. Chowdhury, K.P., 2023. "Nonparametric functional analysis under joint estimation with applications to identifying highly cited papers," Journal of Informetrics, Elsevier, vol. 17(4).
    3. Wang, Zhenhua & Ren, Ming & Gao, Dong & Li, Zhuang, 2023. "A Zipf's law-based text generation approach for addressing imbalance in entity extraction," Journal of Informetrics, Elsevier, vol. 17(4).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hu, Ya-Han & Tai, Chun-Tien & Liu, Kang Ernest & Cai, Cheng-Fang, 2020. "Identification of highly-cited papers using topic-model-based and bibliometric features: the consideration of keyword popularity," Journal of Informetrics, Elsevier, vol. 14(1).
    2. Anqi Ma & Yu Liu & Xiujuan Xu & Tao Dong, 2021. "A deep-learning based citation count prediction model with paper metadata semantic features," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6803-6823, August.
    3. Wanjun Xia & Tianrui Li & Chongshou Li, 2023. "A review of scientific impact prediction: tasks, features and methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 543-585, January.
    4. Wang, Xing & Zhang, Zhihui, 2020. "Improving the reliability of short-term citation impact indicators by taking into account the correlation between short- and long-term citation impact," Journal of Informetrics, Elsevier, vol. 14(2).
    5. Wumei Du & Zheng Xie & Yiqin Lv, 2021. "Predicting publication productivity for authors: Shallow or deep architecture?," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5855-5879, July.
    6. Martorell Cunil, Onofre & Otero González, Luis & Durán Santomil, Pablo & Mulet Forteza, Carlos, 2023. "How to accomplish a highly cited paper in the tourism, leisure and hospitality field," Journal of Business Research, Elsevier, vol. 157(C).
    7. Zhao, Qihang & Feng, Xiaodong, 2022. "Utilizing citation network structure to predict paper citation counts: A Deep learning approach," Journal of Informetrics, Elsevier, vol. 16(1).
    8. Xie, Zheng, 2020. "Predicting publication productivity for researchers: A piecewise Poisson model," Journal of Informetrics, Elsevier, vol. 14(3).
    9. Li, Xin & Ma, Xiaodi & Feng, Ye, 2024. "Early identification of breakthrough research from sleeping beauties using machine learning," Journal of Informetrics, Elsevier, vol. 18(2).
    10. Tehmina Amjad & Nafeesa Shahid & Ali Daud & Asma Khatoon, 2022. "Citation burst prediction in a bibliometric network," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2773-2790, May.
    11. Kehan Wang & Wenxuan Shi & Junsong Bai & Xiaoping Zhao & Liying Zhang, 2021. "Prediction and application of article potential citations based on nonlinear citation-forecasting combined model," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6533-6550, August.
    12. Sato, Ryoma & Yamada, Makoto & Kashima, Hisashi, 2022. "Poincare: Recommending Publication Venues via Treatment Effect Estimation," Journal of Informetrics, Elsevier, vol. 16(2).
    13. Fang Zhang & Shengli Wu, 2024. "Predicting citation impact of academic papers across research areas using multiple models and early citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 4137-4166, July.
    14. Ruan, Xuanmin & Zhu, Yuanyang & Li, Jiang & Cheng, Ying, 2020. "Predicting the citation counts of individual papers via a BP neural network," Journal of Informetrics, Elsevier, vol. 14(3).
    15. Zhang, Xinyuan & Xie, Qing & Song, Min, 2021. "Measuring the impact of novelty, bibliometric, and academic-network factors on citation count using a neural network," Journal of Informetrics, Elsevier, vol. 15(2).
    16. Kong, Ling & Wang, Dongbo, 2020. "Comparison of citations and attention of cover and non-cover papers," Journal of Informetrics, Elsevier, vol. 14(4).
    17. Giorgio Gulino & Federico Masera, 2023. "Contagious Dishonesty: Corruption Scandals and Supermarket Theft," American Economic Journal: Applied Economics, American Economic Association, vol. 15(4), pages 218-251, October.
    18. Qianqian Jin & Hongshu Chen & Ximeng Wang & Tingting Ma & Fei Xiong, 2022. "Exploring funding patterns with word embedding-enhanced organization–topic networks: a case study on big data," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5415-5440, September.
    19. Weitzel, Utz & Kirchler, Michael, 2023. "The Banker’s oath and financial advice," Journal of Banking & Finance, Elsevier, vol. 148(C).
    20. Kiran Tomlinson & Johan Ugander & Austin R. Benson, 2021. "Choice Set Confounding in Discrete Choice," Papers 2105.07959, arXiv.org, revised Aug 2021.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:15:y:2021:i:1:s1751157720306295. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.