IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0306359.html
   My bibliography  Save this article

Comparison of model feature importance statistics to identify covariates that contribute most to model accuracy in prediction of insomnia

Author

Listed:
  • Alexander A Huang
  • Samuel Y Huang

Abstract

Importance: Sleep is critical to a person’s physical and mental health and there is a need to create high performing machine learning models and critically understand how models rank covariates. Objective: The study aimed to compare how different model metrics rank the importance of various covariates. Design, setting, and participants: A cross-sectional cohort study was conducted retrospectively using the National Health and Nutrition Examination Survey (NHANES), which is publicly available. Methods: This study employed univariate logistic models to filter out strong, independent covariates associated with sleep disorder outcome, which were then used in machine-learning models, of which, the most optimal was chosen. The machine-learning model was used to rank model covariates based on gain, cover, and frequency to identify risk factors for sleep disorder and feature importance was evaluated using both univariable and multivariable t-statistics. A correlation matrix was created to determine the similarity of the importance of variables ranked by different model metrics. Results: The XGBoost model had the highest mean AUROC of 0.865 (SD = 0.010) with Accuracy of 0.762 (SD = 0.019), F1 of 0.875 (SD = 0.766), Sensitivity of 0.768 (SD = 0.023), Specificity of 0.782 (SD = 0.025), Positive Predictive Value of 0.806 (SD = 0.025), and Negative Predictive Value of 0.737 (SD = 0.034). The model metrics from the machine learning of gain and cover were strongly positively correlated with one another (r > 0.70). Model metrics from the multivariable model and univariable model were weakly negatively correlated with machine learning model metrics (R between -0.3 and 0). Conclusion: The ranking of important variables associated with sleep disorder in this cohort from the machine learning models were not related to those from regression models.

Suggested Citation

  • Alexander A Huang & Samuel Y Huang, 2024. "Comparison of model feature importance statistics to identify covariates that contribute most to model accuracy in prediction of insomnia," PLOS ONE, Public Library of Science, vol. 19(7), pages 1-12, July.
  • Handle: RePEc:plo:pone00:0306359
    DOI: 10.1371/journal.pone.0306359
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0306359
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0306359&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0306359?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0306359. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.