IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0291107.html
   My bibliography  Save this article

A machine learning approach to graduate admissions and the role of letters of recommendation

Author

Listed:
  • Yijun Zhao
  • Xiaoyu Chen
  • Haoran Xue
  • Gary M Weiss

Abstract

The graduate admissions process is time-consuming, subjective, and complicated by the need to combine information from diverse data sources. Letters of recommendation (LORs) are particularly difficult to evaluate and it is unclear how much impact they have on admissions decisions. This study addresses these concerns by building machine learning models to predict admissions decisions for two STEM graduate programs, with a focus on examining the contribution of LORs in the decision-making process. We train our predictive models leveraging information extracted from structured application forms (e.g., undergraduate GPA, standardized test scores, etc.), applicants’ resumes, and LORs. A particular challenge in our study is the different modalities of application data (i.e., text vs. structured forms). To address this issue, we converted the textual LORs into features using a commercial natural language processing product and a manual rating process that we developed. By analyzing the predictive performance of the models using different subsets of features, we show that LORs alone provide only modest, but useful, predictive signals to admission decisions; the best model for predicting admissions decisions utilized both LOR and non-LOR data and achieved 89% accuracy. Our experiments demonstrate promising results in the utility of automated systems for assisting with graduate admission decisions. The findings confirm the value of LORs and the effectiveness of our feature engineering methods from LOR text. This study also assesses the significance of individual features using the SHAP method, thereby providing insight into key factors affecting graduate admission decisions.

Suggested Citation

  • Yijun Zhao & Xiaoyu Chen & Haoran Xue & Gary M Weiss, 2023. "A machine learning approach to graduate admissions and the role of letters of recommendation," PLOS ONE, Public Library of Science, vol. 18(10), pages 1-17, October.
  • Handle: RePEc:plo:pone00:0291107
    DOI: 10.1371/journal.pone.0291107
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0291107
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0291107&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0291107?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Moore, James S, 1998. "An Expert System Approach to Graduate School Admission Decisions and Academic Performance Prediction," Omega, Elsevier, vol. 26(5), pages 659-670, October.
    2. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    3. Rothstein, Jesse, 2022. "Qualitative information in undergraduate admissions: A pilot study of letters of recommendation," Economics of Education Review, Elsevier, vol. 89(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Curci, Ylenia & Mongeau Ospina, Christian A., 2016. "Investigating biofuels through network analysis," Energy Policy, Elsevier, vol. 97(C), pages 60-72.
    2. Chao Wei & Senlin Luo & Xincheng Ma & Hao Ren & Ji Zhang & Limin Pan, 2016. "Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation," PLOS ONE, Public Library of Science, vol. 11(1), pages 1-20, January.
    3. Maksym Polyakov & Morteza Chalak & Md. Sayed Iftekhar & Ram Pandit & Sorada Tapsuwan & Fan Zhang & Chunbo Ma, 2018. "Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 71(1), pages 217-239, September.
    4. Ding, Ying, 2011. "Community detection: Topological vs. topical," Journal of Informetrics, Elsevier, vol. 5(4), pages 498-514.
    5. Juan Shi & Kin Keung Lai & Ping Hu & Gang Chen, 2018. "Factors dominating individual information disseminating behavior on social networking sites," Information Technology and Management, Springer, vol. 19(2), pages 121-139, June.
    6. Ganesh Dash & Chetan Sharma & Shamneesh Sharma, 2023. "Sustainable Marketing and the Role of Social Media: An Experimental Study Using Natural Language Processing (NLP)," Sustainability, MDPI, vol. 15(6), pages 1-16, March.
    7. repec:osf:socarx:49qxk_v1 is not listed on IDEAS
    8. Paola Cerchiello & Giancarlo Nicola, 2018. "Assessing News Contagion in Finance," Econometrics, MDPI, vol. 6(1), pages 1-19, February.
    9. Shr-Wei Kao & Pin Luarn, 2020. "Topic Modeling Analysis of Social Enterprises: Twitter Evidence," Sustainability, MDPI, vol. 12(8), pages 1-20, April.
    10. Gissler, Stefan & Oldfather, Jeremy & Ruffino, Doriana, 2016. "Lending on hold: Regulatory uncertainty and bank lending standards," Journal of Monetary Economics, Elsevier, vol. 81(C), pages 89-101.
    11. Alina Evstigneeva & Mark Sidorovskiy, 2021. "Assessment of Clarity of Bank of Russia Monetary Policy Communication by Neural Network Approach," Russian Journal of Money and Finance, Bank of Russia, vol. 80(3), pages 3-33, September.
    12. repec:osf:socarx:8jbvg_v1 is not listed on IDEAS
    13. Hei-Chia Wang & Tzu-Ting Hsu & Yunita Sari, 2019. "Personal research idea recommendation using research trends and a hierarchical topic model," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1385-1406, December.
    14. Marcin Chlebus & Maciej Stefan Świtała, 2020. "So close and so far. Finding similar tendencies in econometrics and machine learning papers. Topic models comparison," Working Papers 2020-16, Faculty of Economic Sciences, University of Warsaw.
    15. De Caigny, Arno & Coussement, Kristof & De Bock, Koen W. & Lessmann, Stefan, 2020. "Incorporating textual information in customer churn prediction models based on a convolutional neural network," International Journal of Forecasting, Elsevier, vol. 36(4), pages 1563-1578.
    16. Hutchison, Paul D. & Daigle, Ronald J. & George, Benjamin, 2018. "Application of latent semantic analysis in AIS academic research," International Journal of Accounting Information Systems, Elsevier, vol. 31(C), pages 83-96.
    17. Emad Mohamed & Sayed A. Mostafa, 2019. "Computing Happiness from Textual Data," Stats, MDPI, vol. 2(3), pages 1-24, July.
    18. Jake R. Nelson & Tony H. Grubesic, 2018. "Environmental Justice: A Panoptic Overview Using Scientometrics," Sustainability, MDPI, vol. 10(4), pages 1-18, March.
    19. Lüdering Jochen & Winker Peter, 2016. "Forward or Backward Looking? The Economic Discourse and the Observed Reality," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 236(4), pages 483-515, August.
    20. Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.
    21. Jonathan H. Ashtor, 2019. "Investigating Cohort Similarity as an Ex Ante Alternative to Patent Forward Citations," Journal of Empirical Legal Studies, John Wiley & Sons, vol. 16(4), pages 848-880, December.
    22. Agha Mohammad Ali Kermani, Mehrdad & Fatemi Ardestani, Seyed Farshad & Aliahmadi, Alireza & Barzinpour, Farnaz, 2017. "A novel game theoretic approach for modeling competitive information diffusion in social networks with heterogeneous nodes," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 466(C), pages 570-582.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0291107. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.