IDEAS home Printed from https://ideas.repec.org/a/plo/pdig00/0000951.html
   My bibliography  Save this article

Evaluating algorithmic fairness of machine learning models in predicting underweight, overweight, and adiposity across socioeconomic and caste groups in India: evidence from the longitudinal ageing study in India

Author

Listed:
  • John Tayu Lee
  • Sheng Hui Hsu
  • Vincent Cheng-Sheng Li
  • Kanya Anindya
  • Meng-Huan Chen
  • Charlotte Wang
  • Toby Kai-Bo Shen
  • Valerie Tzu Ning Liu
  • Hsiao-Hui Chen
  • Rifat Atun

Abstract

Machine learning (ML) models are increasingly applied to predict body mass index (BMI) and related outcomes, yet their fairness across socioeconomic and caste groups remains uncertain, particularly in contexts of structural inequality. Using nationally representative data from more than 55,000 adults aged 45 years and older in the Longitudinal Ageing Study in India (LASI), we evaluated the accuracy and fairness of multiple ML algorithms—including Random Forest, XGBoost, Gradient Boosting, LightGBM, Deep Neural Networks, and Deep Cross Networks—alongside logistic regression for predicting underweight, overweight, and central adiposity. Models were trained on 80% of the data and tested on 20%, with performance assessed using AUROC, accuracy, sensitivity, specificity, and precision. Fairness was evaluated through subgroup analyses across socioeconomic and caste groups and equity-based metrics such as Equalized Odds and Demographic Parity. Feature importance was examined using SHAP values, and bias-mitigation methods were implemented at pre-processing, in-processing, and post-processing stages. Tree-based models, particularly LightGBM and Gradient Boosting, achieved the highest AUROC values (0.79–0.84). Incorporating socioeconomic and health-related variables improved prediction, but fairness gaps persisted: performance declined for scheduled tribes and lower socioeconomic groups. SHAP analyses identified grip strength, gender, and residence as key drivers of prediction differences. Among mitigation strategies, Reject Option Classification and Equalized Odds Post-processing moderately reduced subgroup disparities but sometimes decreased overall performance, whereas other approaches yielded minimal gains. ML models can effectively predict obesity and adiposity risk in India, but addressing bias is essential for equitable application. Continued refinement of fairness-aware ML methods is needed to support inclusive and effective public-health decision-making.Author summary: India now faces the paradox of widespread under-nutrition alongside a rising tide of obesity among its older population. We asked whether state-of-the-art machine-learning models could accurately identify individuals at highest risk of under-weight, overweight–obesity, and central adiposity while treating all social groups equitably. Using nationally representative data on more than 55,000 adults aged 45 years and above, we compared gradient-boosted decision trees, random forests, logistic regression, and other approaches with conventional regression techniques. Overall, the modern algorithms produced the strongest predictions. Yet a closer look revealed systematic shortfalls for scheduled tribes, scheduled castes, and the lowest income quintile—even when the models achieved excellent accuracy in the population as a whole. We then applied several well-established bias-mitigation strategies, such as re-weighting the training data and post-processing the decision thresholds. These interventions reduced the performance gap for disadvantaged groups, albeit at a modest cost to overall accuracy. By combining careful fairness audits with Shapley-based interpretation of feature importance, we illuminate how socioeconomic and caste-related factors shape both nutritional risk and prediction error. Our findings underscore that fair, trustworthy decision support systems in public health must be designed explicitly with equity objectives, rather than assuming that technical excellence alone will guarantee just outcomes.

Suggested Citation

  • John Tayu Lee & Sheng Hui Hsu & Vincent Cheng-Sheng Li & Kanya Anindya & Meng-Huan Chen & Charlotte Wang & Toby Kai-Bo Shen & Valerie Tzu Ning Liu & Hsiao-Hui Chen & Rifat Atun, 2025. "Evaluating algorithmic fairness of machine learning models in predicting underweight, overweight, and adiposity across socioeconomic and caste groups in India: evidence from the longitudinal ageing st," PLOS Digital Health, Public Library of Science, vol. 4(11), pages 1-16, November.
  • Handle: RePEc:plo:pdig00:0000951
    DOI: 10.1371/journal.pdig.0000951
    as

    Download full text from publisher

    File URL: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000951
    Download Restriction: no

    File URL: https://journals.plos.org/digitalhealth/article/file?id=10.1371/journal.pdig.0000951&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pdig.0000951?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000951. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.