IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0335168.html

Identifying key determinants of health among China’s migrant population using machine learning methods: Evidence from the china migrants dynamic survey

Author

Listed:
  • Bo Dong
  • Yuxin Zhou
  • Li Wang
  • Yiyu Wang
  • Zhenlin Zhang

Abstract

Background: Continuously improving health security for the migrant population is a key component of China’s healthcare system reform. Existing research indicates that migrant health is influenced by multiple factors, yet the relative importance of these factors remains inadequately measured. This study aims to analyze the current health status of China’s migrant population and rank the primary factors influencing their health based on importance. Methods: Data were sourced from the 2018 China Migrants Dynamic Survey, including 108,669 cases after data cleaning. The health status of the migrant population was initially analyzed using frequency and percentage distributions. Logistic regression was then applied to examine the relationship between various factors and migrant health. Subsequently, six machine learning methods (Neural Network, Random Forest, Support Vector Machine, Gradient Boosting Machine, Extra Trees, and Decision Tree) were applied to rank the importance of these factors. A multidimensional performance metric system (accuracy, precision, recall, F1 score, and AUC value) was employed to comprehensively evaluate the classification performance of the models. SHAP (Shapley Additive Prediction) values were used to illustrate the contribution of different factors to the health status of the migrant population. Results: The health status of China’s migrant population is generally positive, though it is influenced by multiple factors, with varying degrees of significance. Among six distinct machine learning models, the Random Forest model demonstrated the best predictive performance. Its results indicate that the key factors affecting migrant health are age, employment, income, and education level. SHAP value analysis reveals that stable employment, higher education levels, and higher income are positively correlated with better health outcomes, while age was predominantly negatively correlated, indicating a detrimental effect on health status. Conclusion: The overall health status of China’s migrant population is relatively optimistic. However, their disadvantaged positions in areas such as education and income expose them to higher health risks. To address these key determinants, further improvements in health safeguards should focus on: developing stratified intervention strategies based on age structure differences; optimizing work environments and employment security; enhancing health literacy; and strengthening public health emergency management and social support systems.

Suggested Citation

  • Bo Dong & Yuxin Zhou & Li Wang & Yiyu Wang & Zhenlin Zhang, 2025. "Identifying key determinants of health among China’s migrant population using machine learning methods: Evidence from the china migrants dynamic survey," PLOS ONE, Public Library of Science, vol. 20(11), pages 1-31, November.
  • Handle: RePEc:plo:pone00:0335168
    DOI: 10.1371/journal.pone.0335168
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0335168
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0335168&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0335168?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ruhnke, Simon A. & Reynolds, Megan M. & Wilson, Fernando A. & Stimpson, Jim P., 2022. "A healthy migrant effect? Estimating health outcomes of the undocumented immigrant population in the United States using machine learning," Social Science & Medicine, Elsevier, vol. 307(C).
    2. Jaehyun Yoon, 2021. "Forecasting of Real GDP Growth Using Machine Learning Models: Gradient Boosting and Random Forest Approach," Computational Economics, Springer;Society for Computational Economics, vol. 57(1), pages 247-265, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Labib Shami & Teddy Lazebnik, 2024. "Implementing Machine Learning Methods in Estimating the Size of the Non-observed Economy," Computational Economics, Springer;Society for Computational Economics, vol. 63(4), pages 1459-1476, April.
    2. Akshita Bassi & Aditya Manchanda & Rajwinder Singh & Mahesh Patel, 2023. "A comparative study of machine learning algorithms for the prediction of compressive strength of rice husk ash-based concrete," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 118(1), pages 209-238, August.
    3. Kéa Baret & Amélie Barbier-Gauchard & Théophilos Papadimitriou, 2021. "Forecasting the Stability and Growth Pact compliance using Machine Learning," Working Papers of BETA 2021-01, Bureau d'Economie Théorique et Appliquée, UDS, Strasbourg.
    4. Hamdy Ahmad Aly Alhendawy & Mohammed Galal Abdallah Mostafa & Mohamed Ibrahim Elgohari & Ibrahim Abdalla Abdelraouf Mohamed & Nabil Medhat Arafat Mahmoud & Mohamed Ahmed Mohamed Mater, 2023. "Determinants of Renewable Energy Production in Egypt New Approach: Machine Learning Algorithms," International Journal of Energy Economics and Policy, Econjournals, vol. 13(6), pages 679-689, November.
    5. Sakiru Adebola Solarin & Muhammed Sehid Gorus & Onder Ozgur, 2024. "Modelling the economic effect of inbound birth tourism: a random forest algorithm approach," Quality & Quantity: International Journal of Methodology, Springer, vol. 58(5), pages 4223-4240, October.
    6. Muhammed Sehid Gorus & Erdal Tanas Karagol, 2023. "Factors affecting per capita ecological footprint in OECD countries: Evidence from machine learning techniquesa," Energy & Environment, , vol. 34(7), pages 2601-2618, November.
    7. Dietrich, Stephan & Meysonnat, Aline & Rosales, Francisco & Cebotari, Victor & Gassmann, Franziska, 2021. "Economic development, weather shocks and child marriage in South Asia: A machine learning approach," MERIT Working Papers 2021-034, United Nations University - Maastricht Economic and Social Research Institute on Innovation and Technology (MERIT).
    8. Anna Kożuch & Dominika Cywicka & Marek Wieruszewski & Miloš Gejdoš & Krzysztof Adamowicz, 2025. "The Impact of Selected Market Factors on the Prices of Wood Industry By-Products in Poland in the Context of Climate Policy Changes," Energies, MDPI, vol. 18(16), pages 1-25, August.
    9. James T. E. Chapman & Ajit Desai, 2023. "Macroeconomic Predictions Using Payments Data and Machine Learning," Forecasting, MDPI, vol. 5(4), pages 1-32, November.
    10. Wishnu Badrawani, 2025. "An Interpretable Machine Learning Approach in Predicting Inflation Using Payments System Data: A Case Study of Indonesia," Papers 2506.10369, arXiv.org.
    11. Guilherme Schultz Lindenmeyer & Hudson Silva Torrent, 2024. "Boosting and Predictability of Macroeconomic Variables: Evidence from Brazil," Computational Economics, Springer;Society for Computational Economics, vol. 64(1), pages 377-409, July.
    12. Arnab Mitra & Arnav Jain & Avinash Kishore & Pravin Kumar, 2022. "A Comparative Study of Demand Forecasting Models for a Multi-Channel Retail Company: A Novel Hybrid Machine Learning Approach," SN Operations Research Forum, Springer, vol. 3(4), pages 1-22, December.
    13. Teddy Lazebnik, 2025. "Going a Step Deeper Down the Rabbit Hole: Deep Learning Model to Measure the Size of the Unregistered Economy Activity," Computational Economics, Springer;Society for Computational Economics, vol. 65(3), pages 1759-1774, March.
    14. Dennis W. Campbell & Ruidi Shang, 2022. "Tone at the Bottom: Measuring Corporate Misconduct Risk from the Text of Employee Reviews," Management Science, INFORMS, vol. 68(9), pages 7034-7053, September.
    15. Diana Barro & Antonella Basso & Marco Corazza & Guglielmo Alessandro Visentin, 2025. "A Neural Network-VAR for Long-Term Forecasting: An Application to Monetary Policy Effects in the Euro Area," Working Papers 2025: 24, Department of Economics, University of Venice "Ca' Foscari".
    16. Muhammad Daniyal & Kassim Tawiah & Moiz Qureshi & Mohammad Haseeb & Killian Asampana Asosega & Mustafa Kamal & Masood ur Rehman, 2023. "An autoregressive distributed lag approach for estimating the nexus between CO2 emissions and economic determinants in Pakistan," PLOS ONE, Public Library of Science, vol. 18(5), pages 1-18, May.
    17. Yanfen Tong & Jun Nie & Xianbao Cheng, 2025. "Guangxi GDP Prediction Model Based on Principal Component Analysis and SSA–SVM," Computational Economics, Springer;Society for Computational Economics, vol. 66(2), pages 1191-1213, August.
    18. Zin Mar Oo & Ching‐Yang Lin & Makoto Kakinaka, 2025. "Deciphering Long‐Term Economic Growth: An Exploration With Leading Machine Learning Techniques," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 44(4), pages 1531-1562, July.
    19. Said Benkaciali & Gilles Notton & Cyril Voyant, 2025. "Comparative Study of Feature Selection Techniques for Machine Learning-Based Solar Irradiation Forecasting to Facilitate the Sustainable Development of Photovoltaics: Application to Algerian Climatic ," Sustainability, MDPI, vol. 17(14), pages 1-28, July.
    20. Juan Laborda & Sonia Ruano & Ignacio Zamanillo, 2023. "Multi-Country and Multi-Horizon GDP Forecasting Using Temporal Fusion Transformers," Mathematics, MDPI, vol. 11(12), pages 1-26, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0335168. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.