IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0335656.html

Machine learning-based prediction of metabolic dysfunction-associated steatotic liver disease using National Health and Nutrition Examination Survey (NHANES) data

Author

Listed:
  • Yong Zhang
  • Xiang Liu
  • Xingqiang Zhang
  • Yangfan Fei
  • Xiaoxu Li

Abstract

Objective: With the global increase in obesity rates and lifestyle changes, metabolic dysfunction-associated steatotic liver disease (MASLD) has become a prevalent chronic liver disorder, affecting approximately 25% of the global population. This disease can progress to cirrhosis and liver cancer, posing a significant threat to public health. To facilitate early diagnosis and intervention, this study aims to develop an efficient and reliable prediction model for MASLD using machine learning algorithm. Methods: This study included 9,232 participants aged 20 years and older from the 2017–2020 National Health and Nutrition Examination Survey (NHANES). After excluding individuals with frequent alcohol consumption, hepatitis B/C infection, those lacking liver ultrasound examinations, and samples with missing data, a total of 2,460 subjects were ultimately included. The dataset was split into training and testing sets in an 80:20 ratio. Five machine learning algorithms—XGBoost, Random Forest (RF), and Logistic Regression (LR), among others—were utilized to build prediction models, while Recursive Feature Elimination (RFE) was employed to identify key predictive factors. Results: Comparison of the five algorithms revealed that the XGBoost algorithm performed the best. Twelve key features were selected through Recursive Feature Elimination (RFE), and the model achieved an AUC of 0.8740 on the testing set, demonstrating excellent predictive accuracy and discriminative ability. SHAP plot analysis of the model showed that waist circumference, BMI, and other factors played a pivotal role in the prediction of MASLD. Conclusion: The prediction model developed using the XGBoost algorithm and the 12 selected features demonstrates high efficiency and stability in assessing MASLD risk. This model offers innovative technical solutions and data-driven support for the clinical early identification of high-risk populations, with the potential to optimize and refine MASLD prevention and control strategies.

Suggested Citation

  • Yong Zhang & Xiang Liu & Xingqiang Zhang & Yangfan Fei & Xiaoxu Li, 2025. "Machine learning-based prediction of metabolic dysfunction-associated steatotic liver disease using National Health and Nutrition Examination Survey (NHANES) data," PLOS ONE, Public Library of Science, vol. 20(11), pages 1-13, November.
  • Handle: RePEc:plo:pone00:0335656
    DOI: 10.1371/journal.pone.0335656
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0335656
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0335656&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0335656?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0335656. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.