IDEAS home Printed from https://ideas.repec.org/a/plo/pdig00/0000951.html

Evaluating algorithmic fairness of machine learning models in predicting underweight, overweight, and adiposity across socioeconomic and caste groups in India: evidence from the longitudinal ageing study in India

Author

Listed:
  • John Tayu Lee
  • Sheng Hui Hsu
  • Vincent Cheng-Sheng Li
  • Kanya Anindya
  • Meng-Huan Chen
  • Charlotte Wang
  • Toby Kai-Bo Shen
  • Valerie Tzu Ning Liu
  • Hsiao-Hui Chen
  • Rifat Atun

Abstract

Machine learning (ML) models are increasingly applied to predict body mass index (BMI) and related outcomes, yet their fairness across socioeconomic and caste groups remains uncertain, particularly in contexts of structural inequality. Using nationally representative data from more than 55,000 adults aged 45 years and older in the Longitudinal Ageing Study in India (LASI), we evaluated the accuracy and fairness of multiple ML algorithms—including Random Forest, XGBoost, Gradient Boosting, LightGBM, Deep Neural Networks, and Deep Cross Networks—alongside logistic regression for predicting underweight, overweight, and central adiposity. Models were trained on 80% of the data and tested on 20%, with performance assessed using AUROC, accuracy, sensitivity, specificity, and precision. Fairness was evaluated through subgroup analyses across socioeconomic and caste groups and equity-based metrics such as Equalized Odds and Demographic Parity. Feature importance was examined using SHAP values, and bias-mitigation methods were implemented at pre-processing, in-processing, and post-processing stages. Tree-based models, particularly LightGBM and Gradient Boosting, achieved the highest AUROC values (0.79–0.84). Incorporating socioeconomic and health-related variables improved prediction, but fairness gaps persisted: performance declined for scheduled tribes and lower socioeconomic groups. SHAP analyses identified grip strength, gender, and residence as key drivers of prediction differences. Among mitigation strategies, Reject Option Classification and Equalized Odds Post-processing moderately reduced subgroup disparities but sometimes decreased overall performance, whereas other approaches yielded minimal gains. ML models can effectively predict obesity and adiposity risk in India, but addressing bias is essential for equitable application. Continued refinement of fairness-aware ML methods is needed to support inclusive and effective public-health decision-making.Author summary: India now faces the paradox of widespread under-nutrition alongside a rising tide of obesity among its older population. We asked whether state-of-the-art machine-learning models could accurately identify individuals at highest risk of under-weight, overweight–obesity, and central adiposity while treating all social groups equitably. Using nationally representative data on more than 55,000 adults aged 45 years and above, we compared gradient-boosted decision trees, random forests, logistic regression, and other approaches with conventional regression techniques. Overall, the modern algorithms produced the strongest predictions. Yet a closer look revealed systematic shortfalls for scheduled tribes, scheduled castes, and the lowest income quintile—even when the models achieved excellent accuracy in the population as a whole. We then applied several well-established bias-mitigation strategies, such as re-weighting the training data and post-processing the decision thresholds. These interventions reduced the performance gap for disadvantaged groups, albeit at a modest cost to overall accuracy. By combining careful fairness audits with Shapley-based interpretation of feature importance, we illuminate how socioeconomic and caste-related factors shape both nutritional risk and prediction error. Our findings underscore that fair, trustworthy decision support systems in public health must be designed explicitly with equity objectives, rather than assuming that technical excellence alone will guarantee just outcomes.

Suggested Citation

  • John Tayu Lee & Sheng Hui Hsu & Vincent Cheng-Sheng Li & Kanya Anindya & Meng-Huan Chen & Charlotte Wang & Toby Kai-Bo Shen & Valerie Tzu Ning Liu & Hsiao-Hui Chen & Rifat Atun, 2025. "Evaluating algorithmic fairness of machine learning models in predicting underweight, overweight, and adiposity across socioeconomic and caste groups in India: evidence from the longitudinal ageing study in India," PLOS Digital Health, Public Library of Science, vol. 4(11), pages 1-16, November.
  • Handle: RePEc:plo:pdig00:0000951
    DOI: 10.1371/journal.pdig.0000951
    as

    Download full text from publisher

    File URL: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000951
    Download Restriction: no

    File URL: https://journals.plos.org/digitalhealth/article/file?id=10.1371/journal.pdig.0000951&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pdig.0000951?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sengupta, Angan & Angeli, Federica & Syamala, Thelakkat S. & Dagnelie, Pieter C. & Schayck, C.P. van, 2015. "Overweight and obesity prevalence among Indian women by place of residence and socio-economic status: Contrasting patterns from ‘underweight states’ and ‘overweight states’ of India," Social Science & Medicine, Elsevier, vol. 138(C), pages 161-169.
    2. S V Subramanian & George Davey Smith & Malavika Subramanyam, 2006. "Indigenous Health and Socioeconomic Status in India," PLOS Medicine, Public Library of Science, vol. 3(10), pages 1-11, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Subramanian, S.V. & Subramanyam, Malavika A. & Selvaraj, Sakthivel & Kawachi, Ichiro, 2009. "Are self-reports of health and morbidities in developing countries misleading? Evidence from India," Social Science & Medicine, Elsevier, vol. 68(2), pages 260-265, January.
    2. Jayanta Kumar Bora & Rajesh Raushan & Wolfgang Lutz, 2019. "The persistent influence of caste on under-five mortality: Factors that explain the caste-based gap in high focus Indian states," PLOS ONE, Public Library of Science, vol. 14(8), pages 1-20, August.
    3. Swati Dutta, 2022. "Risk factors for child survival among tribal dominated states in India: a pooled cross sectional analysis," Journal of Population Research, Springer, vol. 39(3), pages 391-416, September.
    4. Satrughan Behera & Atish Kumar Dash & Rathi Kanta Kumbhar, 2023. "Disparities in the Health and Well-being of Scheduled Tribes and Non-Scheduled Tribes Populations in India," Shanlax International Journal of Economics, Shanlax Journals, vol. 12(1), pages 69-77, December.
    5. Laurie Brown & Binod Nepal, 2011. "Modelling Potential Impact of Improved Survival of Indigenous Australians on Work-Life Labour Income Gap Between Indigenous and Average Australians," NATSEM Working Paper Series 11/15, University of Canberra, National Centre for Social and Economic Modelling.
    6. June Y T Po & S V Subramanian, 2011. "Mortality Burden and Socioeconomic Status in India," PLOS ONE, Public Library of Science, vol. 6(2), pages 1-8, February.
    7. Ilana G. Raskind & Shailaja S. Patil & Regine Haardörfer & Solveig A. Cunningham, 2018. "Unhealthy Weight in Indian Families: The Role of the Family Environment in the Context of the Nutrition Transition," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 37(2), pages 157-180, April.
    8. Itismita Mohanty & Robert Tanton, 2012. "A wellbeing framework with adaptive capacity," NATSEM Working Paper Series 12/17, University of Canberra, National Centre for Social and Economic Modelling.
    9. Bandita Boro & Nandita Saikia, 2020. "A qualitative study of the barriers to utilizing healthcare services among the tribal population in Assam," PLOS ONE, Public Library of Science, vol. 15(10), pages 1-14, October.
    10. Sahoo, Anil Kumar & Madheswaran, S, 2014. "Healthcare utilisation behaviour in India: Socio-economic disparities & the effect of health insurance," Working Papers 317, Institute for Social and Economic Change, Bangalore.
    11. Kennedy, Jonathan J. & King, Lawrence P., 2011. "Understanding the conviction of Binayak Sen: Neocolonialism, political violence and the political economy of health in the central Indian tribal belt," Social Science & Medicine, Elsevier, vol. 72(10), pages 1639-1642, May.
    12. Aiyar, Anaka & Dhingra, Sunaina & Pingali, Prabhu, 2021. "Transitioning to an obese India: Demographic and structural determinants of the rapid rise in overweight incidence," Economics & Human Biology, Elsevier, vol. 43(C).
    13. Aiyar, Anaka & Rahman, Andaleeb & Pingali, Prabhu, 2021. "India’s rural transformation and rising obesity burden," World Development, Elsevier, vol. 138(C).
    14. Siddiqui, Zakaria & Donato, Ronald, 2020. "The dramatic rise in the prevalence of overweight and obesity in India: Obesity transition and the looming health care crisis," World Development, Elsevier, vol. 134(C).
    15. Pathak, Praveen Kumar & Singh, Abhishek, 2011. "Trends in malnutrition among children in India: Growing inequalities across different economic groups," Social Science & Medicine, Elsevier, vol. 73(4), pages 576-585, August.
    16. Anjana Rai & Swadesh Gurung & Subash Thapa & Naomi M Saville, 2019. "Correlates and inequality of underweight and overweight among women of reproductive age: Evidence from the 2016 Nepal Demographic Health Survey," PLOS ONE, Public Library of Science, vol. 14(5), pages 1-16, May.
    17. Jayanta Kumar Bora & Rajesh Raushan & Wolfgang Lutz, 2018. "Contribution of Education to Infant and Under-Five Mortality Disparities among Caste Groups in India," VID Working Papers 1803, Vienna Institute of Demography (VID) of the Austrian Academy of Sciences in Vienna.
    18. Sandeep S. Nerkar & Ashish Pathak & Cecilia Stålsby Lundborg & Ashok J. Tamhankar, 2015. "Can Integrated Watershed Management Contribute to Improvement of Public Health? A Cross-Sectional Study from Hilly Tribal Villages in India," IJERPH, MDPI, vol. 12(3), pages 1-17, February.
    19. Christophe Z Guilmoto, 2022. "An alternative estimation of the death toll of the Covid-19 pandemic in India," PLOS ONE, Public Library of Science, vol. 17(2), pages 1-14, February.
    20. Anirudh Krishna & Kripa Ananthpur, 2013. "Globalization, Distance and Disease: Spatial Health Disparities in Rural India," Millennial Asia, , vol. 4(1), pages 3-25, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000951. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.