IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i8p1289-d792690.html
   My bibliography  Save this article

Mining Campus Big Data: Prediction of Career Choice Using Interpretable Machine Learning Method

Author

Listed:
  • Yuan Wang

    (College of Humanities and Law, Beijing University of Chemical Technology, Beijing 100029, China
    School of Economics and Management, Beijing University of Chemical Technology, Beijing 100029, China)

  • Liping Yang

    (School of Economics and Management, Beijing University of Chemical Technology, Beijing 100029, China)

  • Jun Wu

    (School of Economics and Management, Beijing University of Chemical Technology, Beijing 100029, China)

  • Zisheng Song

    (Department of International Exchange and Cooperation, Beijing University of Chemical Technology, Beijing 100029, China)

  • Li Shi

    (China Information Communication Technology Group Corporation, Beijing 100191, China)

Abstract

The issue of students’ career choice is the common concern of students themselves, parents, and educators. However, students’ behavioral data have not been thoroughly studied for understanding their career choice. In this study, we used eXtreme Gradient Boosting (XGBoost), a machine learning (ML) technique, to predict the career choice of college students using a real-world dataset collected in a specific college. Specifically, the data include information on the education and career choice of 18,000 graduates during their college years. In addition, SHAP (Shapley Additive exPlanation) was employed to interpret the results and analyze the importance of individual features. The results show that XGBoost can predict students’ career choice robustly with a precision, recall rate, and an F 1 value of 89.1%, 85.4%, and 0.872, respectively. Furthermore, the interaction of features among four different choices of students (i.e., choose to study in China, choose to work, difficulty in finding a job, and choose to study aboard) were also explored. Several educational features, especially differences in grade point average (GPA) during their college studying, are found to have relatively larger impact on the final choice of career. These results can be of help in the planning, design, and implementation of higher educational institutions’ (HEIs) events.

Suggested Citation

  • Yuan Wang & Liping Yang & Jun Wu & Zisheng Song & Li Shi, 2022. "Mining Campus Big Data: Prediction of Career Choice Using Interpretable Machine Learning Method," Mathematics, MDPI, vol. 10(8), pages 1-18, April.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:8:p:1289-:d:792690
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/8/1289/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/8/1289/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Kamran Shaukat & Suhuai Luo & Vijay Varadharajan & Ibrahim A. Hameed & Shan Chen & Dongxi Liu & Jiaming Li, 2020. "Performance Comparison and Current Challenges of Using Machine Learning Techniques in Cybersecurity," Energies, MDPI, vol. 13(10), pages 1-27, May.
    2. Stijn Baert & Sunčica Vujić & Simon Amez & Matteo Claeskens & Thomas Daman & Arno Maeckelberghe & Eddy Omey & Lieven De Marez, 2020. "Smartphone Use and Academic Performance: Correlation or Causal Relationship?," Kyklos, Wiley Blackwell, vol. 73(1), pages 22-46, February.
    3. Jiang, Cuiqing & Wang, Zhao & Zhao, Huimin, 2019. "A prediction-driven mixture cure model and its application in credit scoring," European Journal of Operational Research, Elsevier, vol. 277(1), pages 20-31.
    4. Sami Ben Jabeur & Salma Mefteh-Wali & Jean-Laurent Viviani, 2021. "Forecasting gold price with the XGBoost algorithm and SHAP interaction values," Post-Print hal-03331805, HAL.
    5. Zhang, Wen & Yan, Shaoshan & Li, Jian & Tian, Xin & Yoshida, Taketoshi, 2022. "Credit risk prediction of SMEs in supply chain finance by fusing demographic and behavioral data," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 158(C).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gaeithry Manoharam & Mohd Shareduwan Mohd Kasihmuddin & Siti Noor Farwina Mohamad Anwar Antony & Nurul Atiqah Romli & Nur ‘Afifah Rusdi & Suad Abdeen & Mohd. Asyraf Mansor, 2023. "Log-Linear-Based Logic Mining with Multi-Discrete Hopfield Neural Network," Mathematics, MDPI, vol. 11(9), pages 1-30, April.
    2. Wen Zhang & Xiaofeng Xu & Jun Wu & Kaijian He, 2023. "Preface to the Special Issue on “Computational and Mathematical Methods in Information Science and Engineering”," Mathematics, MDPI, vol. 11(14), pages 1-4, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Eline Moens & Louis Lippens & Philippe Sterkens & Johannes Weytjens & Stijn Baert, 2022. "The COVID-19 crisis and telework: a research survey on experiences, expectations and hopes," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 23(4), pages 729-753, June.
    2. Gunnarsson, Björn Rafn & vanden Broucke, Seppe & Baesens, Bart & Óskarsdóttir, María & Lemahieu, Wilfried, 2021. "Deep learning for credit scoring: Do or don’t?," European Journal of Operational Research, Elsevier, vol. 295(1), pages 292-305.
    3. Fatima Rafiq & Mazhar Javed Awan & Awais Yasin & Haitham Nobanee & Azlan Mohd Zain & Saeed Ali Bahaj, 2022. "Privacy Prevention of Big Data Applications: A Systematic Literature Review," SAGE Open, , vol. 12(2), pages 21582440221, May.
    4. Goodell, John W. & Ben Jabeur, Sami & Saâdaoui, Foued & Nasir, Muhammad Ali, 2023. "Explainable artificial intelligence modeling to forecast bitcoin prices," International Review of Financial Analysis, Elsevier, vol. 88(C).
    5. Dylan Norbert Gono & Herlina Napitupulu & Firdaniza, 2023. "Silver Price Forecasting Using Extreme Gradient Boosting (XGBoost) Method," Mathematics, MDPI, vol. 11(18), pages 1-15, September.
    6. Naveed Hayat & Muhammad Imran & Shabbir Ahmad & Adnan Ali Shahzad & Jamshaid ur Rehman, 2022. "The Effect of Mobile Phone Use on the Students’ Budget, Social Behavior and Academic Performance: A Case Study of Bacha Khan University, Charsadda, Pakistan," Journal of Policy Research (JPR), Research Foundation for Humanity (RFH), vol. 8(3), pages 122-134, September.
    7. Mahsa Tavakoli & Rohitash Chandra & Fengrui Tian & Cristi'an Bravo, 2023. "Multi-Modal Deep Learning for Credit Rating Prediction Using Text and Numerical Data Streams," Papers 2304.10740, arXiv.org, revised Sep 2023.
    8. Davood Pirayesh Neghab & Mucahit Cevik & M. I. M. Wahab, 2023. "Explaining Exchange Rate Forecasts with Macroeconomic Fundamentals Using Interpretive Machine Learning," Papers 2303.16149, arXiv.org.
    9. Kriebel, Johannes & Stitz, Lennart, 2022. "Credit default prediction from user-generated text in peer-to-peer lending using deep learning," European Journal of Operational Research, Elsevier, vol. 302(1), pages 309-323.
    10. Aras, Serkan & Hanifi Van, M., 2022. "An interpretable forecasting framework for energy consumption and CO2 emissions," Applied Energy, Elsevier, vol. 328(C).
    11. Amez, Simon & Vujić, Sunčica & De Marez, Lieven & Baert, Stijn, 2019. "Smartphone Use and Academic Performance: First Evidence from Longitudinal Data," GLO Discussion Paper Series 438, Global Labor Organization (GLO).
    12. Chetna Monga & Deepali Gupta & Devendra Prasad & Sapna Juneja & Ghulam Muhammad & Zulfiqar Ali, 2022. "Sustainable Network by Enhancing Attribute-Based Selection Mechanism Using Lagrange Interpolation," Sustainability, MDPI, vol. 14(10), pages 1-15, May.
    13. Jeronymo Marcondes Pinto & Jennifer L. Castle, 2022. "Machine Learning Dynamic Switching Approach to Forecasting in the Presence of Structural Breaks," Journal of Business Cycle Research, Springer;Centre for International Research on Economic Tendency Surveys (CIRET), vol. 18(2), pages 129-157, July.
    14. Silva, Diego M.B. & Pereira, Gustavo H.A. & Magalhães, Tiago M., 2022. "A class of categorization methods for credit scoring models," European Journal of Operational Research, Elsevier, vol. 296(1), pages 323-331.
    15. Simon Amez & Stijn Baert, 2021. "Bye, bye, Hotel Mama, bye, bye good grades? Living in a student room and exam results in tertiary education," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 21/1018, Ghent University, Faculty of Economics and Business Administration.
    16. Amez, Simon & Denecker, Floor & Ponnet, Koen & De Marez, Lieven & Baert, Stijn, 2021. "Mobile DNA and Sleep Quality," IZA Discussion Papers 14816, Institute of Labor Economics (IZA).
    17. Stijn Baert & Sunčica Vujić & Simon Amez & Matteo Claeskens & Thomas Daman & Arno Maeckelberghe & Eddy Omey & Lieven De Marez, 2020. "Smartphone Use and Academic Performance: Correlation or Causal Relationship?," Kyklos, Wiley Blackwell, vol. 73(1), pages 22-46, February.
    18. Bai, Chen & Chen, Xiaomeng & Han, Keqing, 2020. "Mobile phone addiction and school performance among Chinese adolescents from low-income families: A moderated mediation model," Children and Youth Services Review, Elsevier, vol. 118(C).
    19. Hail Jung & Jinsu Jeon & Dahui Choi & Jung-Ywn Park, 2021. "Application of Machine Learning Techniques in Injection Molding Quality Prediction: Implications on Sustainable Manufacturing Industry," Sustainability, MDPI, vol. 13(8), pages 1-16, April.
    20. Frank Cremer & Barry Sheehan & Michael Fortmann & Arash N. Kia & Martin Mullins & Finbarr Murphy & Stefan Materne, 2022. "Cyber risk and cybersecurity: a systematic review of data availability," The Geneva Papers on Risk and Insurance - Issues and Practice, Palgrave Macmillan;The Geneva Association, vol. 47(3), pages 698-736, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:8:p:1289-:d:792690. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.