IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v21y2023i1p2-d1303380.html
   My bibliography  Save this article

Classification of Obesity among South African Female Adolescents: Comparative Analysis of Logistic Regression and Random Forest Algorithms

Author

Listed:
  • Ronel Sewpaul

    (Public Health, Societies and Belonging, Human Sciences Research Council, Merchant House, 2 Dock Rail Road, Cape Town 8001, South Africa)

  • Olushina Olawale Awe

    (Institute of Mathematics, Statistics and Scientific Computing (IMECC), University of Campinas, Campinas 13083-859, Brazil)

  • Dennis Makafui Dogbey

    (Medical Biotechnology and Immunotherapy Research Unit, Institute of Infectious Diseases and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7700, South Africa)

  • Machoene Derrick Sekgala

    (Non-Communicable Diseases, South African Medical Research Council, Cape Town 7505, South Africa)

  • Natisha Dukhi

    (Public Health, Societies and Belonging, Human Sciences Research Council, Merchant House, 2 Dock Rail Road, Cape Town 8001, South Africa)

Abstract

Background: This study evaluates the performance of logistic regression (LR) and random forest (RF) algorithms to model obesity among female adolescents in South Africa. Methods: Data was analysed on 375 females aged 15–17 from the South African National Health and Nutrition Examination Survey 2011/2012. The primary outcome was obesity, defined as body mass index (BMI) ≥ 30 kg/m 2 . A total of 31 explanatory variables were included, ranging from socio-economic, demographic, family history, dietary and health behaviour. RF and LR models were run using imbalanced data as well as after oversampling, undersampling, and hybrid sampling of the data. Results: Using the imbalanced data, the RF model performed better with higher precision, recall, F1 score, and balanced accuracy. Balanced accuracy was highest with the hybrid data (0.618 for RF and 0.668 for LR). Using the hybrid balanced data, the RF model performed better (F1-score = 0.940 for RF vs. 0.798 for LR). Conclusion: The model with the highest overall performance metrics was the RF model both before balancing the data and after applying hybrid balancing. Future work would benefit from using larger datasets on adolescent female obesity to assess the robustness of the models.

Suggested Citation

  • Ronel Sewpaul & Olushina Olawale Awe & Dennis Makafui Dogbey & Machoene Derrick Sekgala & Natisha Dukhi, 2023. "Classification of Obesity among South African Female Adolescents: Comparative Analysis of Logistic Regression and Random Forest Algorithms," IJERPH, MDPI, vol. 21(1), pages 1-15, December.
  • Handle: RePEc:gam:jijerp:v:21:y:2023:i:1:p:2-:d:1303380
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/21/1/2/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/21/1/2/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Shaoyan Zhang & Christos Tjortjis & Xiaojun Zeng & Hong Qiao & Iain Buchan & John Keane, 2009. "Comparing data mining methods with logistic regression in childhood obesity prediction," Information Systems Frontiers, Springer, vol. 11(4), pages 449-460, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Carlos Magno Sousa & Ewaldo Santana & Marcus Vinicius Lopes & Guilherme Lima & Luana Azoubel & Érika Carneiro & Allan Kardec Barros & Nilviane Pires, 2019. "Development of a Computational Model to Predict Excess Body Fat in Adolescents through Low Cost Variables," IJERPH, MDPI, vol. 16(16), pages 1-12, August.
    2. Nida Shahid & Tim Rappon & Whitney Berta, 2019. "Applications of artificial neural networks in health care organizational decision-making: A scoping review," PLOS ONE, Public Library of Science, vol. 14(2), pages 1-22, February.
    3. Cheong Kim & Francis Joseph Costello & Kun Chang Lee & Yuan Li & Chenyao Li, 2019. "Predicting Factors Affecting Adolescent Obesity Using General Bayesian Network and What-If Analysis," IJERPH, MDPI, vol. 16(23), pages 1-18, November.
    4. Davide Barbieri & Nitesh Chawla & Luciana Zaccagni & Tonći Grgurinović & Jelena Šarac & Miran Čoklo & Saša Missoni, 2020. "Predicting Cardiovascular Risk in Athletes: Resampling Improves Classification Performance," IJERPH, MDPI, vol. 17(21), pages 1-9, October.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:21:y:2023:i:1:p:2-:d:1303380. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.