IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v21y2024i11p1474-d1514993.html
   My bibliography  Save this article

Feature Selection and Machine Learning Approaches in Prediction of Current E-Cigarette Use Among U.S. Adults in 2022

Author

Listed:
  • Wei Fang

    (West Virginia Clinical and Translational Science Institute, Morgantown, WV 26506, USA)

  • Ying Liu

    (Department of Biostatistics and Epidemiology, College of Public Health, East Tennessee State University, Johnson City, TN 37614, USA)

  • Chun Xu

    (Department of Health and Biomedical Sciences, College of Health Professions, University of Texas Rio Grande Valley, Brownsville, TX 78520, USA)

  • Xingguang Luo

    (Department of Psychiatry, Yale University School of Medicine, New Haven, CT 06516, USA)

  • Kesheng Wang

    (Department of Biobehavioral Health & Nursing Science, College of Nursing, University of South Carolina, Columbia, SC 29208, USA)

Abstract

Feature selection is essentially the process of picking informative and relevant features from a larger collection of features. Few studies have focused on predictors for current e-cigarette use among U.S. adults using feature selection and machine learning (ML) approaches. This study aimed to perform feature selection and develop ML approaches in prediction of current e-cigarette use using the 2022 Health Information National Trends Survey (HINTS 6). The Boruta algorithm and the least absolute shrinkage and selection operator (LASSO) were used to perform feature selection of 71 variables. The random oversampling example (ROSE) method was utilized to deal with imbalance data. Five ML tools including support vector machines (SVMs), logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and extreme gradient boosting (XGBoost) were applied to develop ML models. The overall prevalence of current e-cigarette use was 4.3%. Using the overlapped 15 variables selected by Boruta and LASSO, the RF algorithm provided the best classifier with an accuracy of 0.992, sensitivity of 0.985, F1 score of 0.991, and AUC of 0.999. Weighted logistic regression further confirmed that age, education level, smoking status, belief in the harm of e-cigarette use, binge drinking, belief in alcohol increasing cancer, and the Patient Health Questionnaire-4 (PHQ4) score were associated with e-cigarette use. This study confirmed the strength of ML techniques in survey data, and the findings will guide inquiry into behaviors and mentalities of substance users.

Suggested Citation

  • Wei Fang & Ying Liu & Chun Xu & Xingguang Luo & Kesheng Wang, 2024. "Feature Selection and Machine Learning Approaches in Prediction of Current E-Cigarette Use Among U.S. Adults in 2022," IJERPH, MDPI, vol. 21(11), pages 1-14, November.
  • Handle: RePEc:gam:jijerp:v:21:y:2024:i:11:p:1474-:d:1514993
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/21/11/1474/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/21/11/1474/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Kursa, Miron B. & Rudnicki, Witold R., 2010. "Feature Selection with the Boruta Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 36(i11).
    2. Nkiruka C. Atuegwu & Mark D. Litt & Suchitra Krishnan-Sarin & Reinhard C. Laubenbacher & Mario F. Perez & Eric M. Mortensen, 2021. "E-Cigarette Use in Young Adult Never Cigarette Smokers with Disabilities: Results from the Behavioral Risk Factor Surveillance System Survey," IJERPH, MDPI, vol. 18(10), pages 1-13, May.
    3. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    4. Kuhn, Max, 2008. "Building Predictive Models in R Using the caret Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i05).
    5. Md Raihan-Al-Masud & M Rubaiyat Hossain Mondal, 2020. "Data-driven diagnosis of spinal abnormalities using feature selection and machine learning algorithms," PLOS ONE, Public Library of Science, vol. 15(2), pages 1-21, February.
    6. Michael Short & Adam Geoffrey Cole, 2021. "Factors Associated with E-Cigarette Escalation among High School Students: A Review of the Literature," IJERPH, MDPI, vol. 18(19), pages 1-10, September.
    7. Nkiruka C. Atuegwu & Cheryl Oncken & Reinhard C. Laubenbacher & Mario F. Perez & Eric M. Mortensen, 2020. "Factors Associated with E-Cigarette Use in U.S. Young Adult Never Smokers of Conventional Cigarettes: A Machine Learning Approach," IJERPH, MDPI, vol. 17(19), pages 1-16, October.
    8. Kim A.G.J. Romijnders & Jeroen L.A. Pennings & Liesbeth van Osch & Hein de Vries & Reinskje Talhout, 2019. "A Combination of Factors Related to Smoking Behavior, Attractive Product Characteristics, and Socio-Cognitive Factors are Important to Distinguish a Dual User from an Exclusive E-Cigarette User," IJERPH, MDPI, vol. 16(21), pages 1-12, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bellotti, Anthony & Brigo, Damiano & Gambetti, Paolo & Vrins, Frédéric, 2021. "Forecasting recovery rates on non-performing loans with machine learning," International Journal of Forecasting, Elsevier, vol. 37(1), pages 428-444.
    2. Štefan Lyócsa & Petra Vašaničová & Branka Hadji Misheva & Marko Dávid Vateha, 2022. "Default or profit scoring credit systems? Evidence from European and US peer-to-peer lending markets," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-21, December.
    3. Arjan S. Gosal & Janine A. McMahon & Katharine M. Bowgen & Catherine H. Hoppe & Guy Ziv, 2021. "Identifying and Mapping Groups of Protected Area Visitors by Environmental Awareness," Land, MDPI, vol. 10(6), pages 1-14, May.
    4. Foutzopoulos, Giorgos & Pandis, Nikolaos & Tsagris, Michail, 2024. "Predicting full retirement attainment of NBA players," MPRA Paper 121540, University Library of Munich, Germany.
    5. Francesco Sartor & Jonathan P. Moore & Hans-Peter Kubis, 2021. "Plasma Interleukin-10 and Cholesterol Levels May Inform about Interdependences between Fitness and Fatness in Healthy Individuals," IJERPH, MDPI, vol. 18(4), pages 1-19, February.
    6. Van Belle, Jente & Guns, Tias & Verbeke, Wouter, 2021. "Using shared sell-through data to forecast wholesaler demand in multi-echelon supply chains," European Journal of Operational Research, Elsevier, vol. 288(2), pages 466-479.
    7. Jun Wang & Jinyong Huang & Yunlong Hu & Qianwen Guo & Shasha Zhang & Jinglin Tian & Yanqin Niu & Ling Ji & Yuzhong Xu & Peijun Tang & Yaqin He & Yuna Wang & Shuya Zhang & Hao Yang & Kang Kang & Xinchu, 2024. "Terminal modifications independent cell-free RNA sequencing enables sensitive early cancer detection and classification," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    8. Faisal Alsayegh & Moh A Alkhamis & Fatima Ali & Sreeja Attur & Nicholas M Fountain-Jones & Mohammad Zubaid, 2022. "Anemia or other comorbidities? using machine learning to reveal deeper insights into the drivers of acute coronary syndromes in hospital admitted patients," PLOS ONE, Public Library of Science, vol. 17(1), pages 1-15, January.
    9. Franck M. Ramaharo & Michael Fitiavana Randriamifidy, 2023. "Determinants of renewable energy consumption in Madagascar: Evidence from feature selection algorithms," Working Papers hal-04262240, HAL.
    10. Nkiruka C. Atuegwu & Mark D. Litt & Suchitra Krishnan-Sarin & Reinhard C. Laubenbacher & Mario F. Perez & Eric M. Mortensen, 2021. "E-Cigarette Use in Young Adult Never Cigarette Smokers with Disabilities: Results from the Behavioral Risk Factor Surveillance System Survey," IJERPH, MDPI, vol. 18(10), pages 1-13, May.
    11. Siddharth Sethi & David Zhang & Sebastian Guelfi & Zhongbo Chen & Sonia Garcia-Ruiz & Emmanuel O. Olagbaju & Mina Ryten & Harpreet Saini & Juan A. Botia, 2022. "Leveraging omic features with F3UTER enables identification of unannotated 3’UTRs for synaptic genes," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    12. Michimasa Fujiogi & Yoshihiko Raita & Marcos Pérez-Losada & Robert J. Freishtat & Juan C. Celedón & Jonathan M. Mansbach & Pedro A. Piedra & Zhaozhong Zhu & Carlos A. Camargo & Kohei Hasegawa, 2022. "Integrated relationship of nasopharyngeal airway host response and microbiome associates with bronchiolitis severity," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    13. Erik Duijvelaar & Jack Gisby & James E. Peters & Harm Jan Bogaard & Jurjan Aman, 2024. "Longitudinal plasma proteomics reveals biomarkers of alveolar-capillary barrier disruption in critically ill COVID-19 patients," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    14. Paweł Teisseyre & Robert A. Kłopotek & Jan Mielniczuk, 2016. "Random Subspace Method for high-dimensional regression with the R package regRSM," Computational Statistics, Springer, vol. 31(3), pages 943-972, September.
    15. Satre-Meloy, Aven & Diakonova, Marina & Grünewald, Philipp, 2020. "Cluster analysis and prediction of residential peak demand profiles using occupant activity data," Applied Energy, Elsevier, vol. 260(C).
    16. Alexander Kirpich & Elizabeth A Ainsworth & Jessica M Wedow & Jeremy R B Newman & George Michailidis & Lauren M McIntyre, 2018. "Variable selection in omics data: A practical evaluation of small sample sizes," PLOS ONE, Public Library of Science, vol. 13(6), pages 1-19, June.
    17. Sara Saadatmand & Khodakaram Salimifard & Reza Mohammadi & Alex Kuiper & Maryam Marzban & Akram Farhadi, 2023. "Using machine learning in prediction of ICU admission, mortality, and length of stay in the early stage of admission of COVID-19 patients," Annals of Operations Research, Springer, vol. 328(1), pages 1043-1071, September.
    18. María Bueno Álvez & Fredrik Edfors & Kalle Feilitzen & Martin Zwahlen & Adil Mardinoglu & Per-Henrik Edqvist & Tobias Sjöblom & Emma Lundin & Natallia Rameika & Gunilla Enblad & Henrik Lindman & Marti, 2023. "Next generation pan-cancer blood proteome profiling using proximity extension assay," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    19. Fitzpatrick, Trevor & Mues, Christophe, 2016. "An empirical comparison of classification algorithms for mortgage default prediction: evidence from a distressed mortgage market," European Journal of Operational Research, Elsevier, vol. 249(2), pages 427-439.
    20. Svetlana Kresova & Sebastian Hess, 2022. "Identifying the Determinants of Regional Raw Milk Prices in Russia Using Machine Learning," Agriculture, MDPI, vol. 12(7), pages 1-18, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:21:y:2024:i:11:p:1474-:d:1514993. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.