IDEAS home Printed from https://ideas.repec.org/a/plo/pdig00/0000308.html
   My bibliography  Save this article

Analysis of lung cancer risk factors from medical records in Ethiopia using machine learning

Author

Listed:
  • Demeke Endalie
  • Wondmagegn Taye Abebe

Abstract

Cancer is a broad term that refers to a wide range of diseases that can affect any part of the human body. To minimize the number of cancer deaths and to prepare an appropriate health policy on cancer spread mitigation, scientifically supported knowledge of cancer causes is critical. As a result, in this study, we analyzed lung cancer risk factors that lead to a highly severe cancer case using a decision tree-based ranking algorithm. This feature relevance ranking algorithm computes the weight of each feature of the dataset by using split points to improve detection accuracy, and each risk factor is weighted based on the number of observations that occur for it on the decision tree. Coughing of blood, air pollution, and obesity are the most severe lung cancer risk factors out of nine, with a weight of 39%, 21%, and 14%, respectively. We also proposed a machine learning model that uses Extreme Gradient Boosting (XGBoost) to detect lung cancer severity levels in lung cancer patients. We used a dataset of 1000 lung cancer patients and 465 individuals free from lung cancer from Tikur Ambesa (Black Lion) Hospital in Addis Ababa, Ethiopia, to assess the performance of the proposed model. The proposed cancer severity level detection model achieved 98.9%, 99%, and 98.9% accuracy, precision, and recall, respectively, for the testing dataset. The findings can assist governments and non-governmental organizations in making lung cancer-related policy decisions.Author summary: Lung cancer has become one of the leading causes of mortality in Ethiopia. Lung cancer risk factors vary from place to place since it depends on the people’s socio-cultural activities. In this study, we examine lung cancer risk factors from the medical records of lung cancer patients in Addis Ababa, Ethiopia. The data contains the medical records of 872 women and 593 men. The key risk variables for lung cancer in the study area were identified using a decision tree. We discovered that coughing blood is one of the major risk factors for lung cancer, with a weight of 0.39. A feature importance of 0.39 indicates that the feature contributes 39% of the overall decision in the detection model. Furthermore, air pollution and obesity are the most important risk factors for lung cancer, with relevance weights of 0.21 and 0.14, respectively. This implies that these risk factors are causing or indicating most lung cancer cases in the study area. These three factors account for 74% of lung cancer analysis in the study area. Furthermore, we use the XGBoost classifier to detect lung cancer severity levels from risk factors, and the experiment yields a significant detection result.

Suggested Citation

  • Demeke Endalie & Wondmagegn Taye Abebe, 2023. "Analysis of lung cancer risk factors from medical records in Ethiopia using machine learning," PLOS Digital Health, Public Library of Science, vol. 2(7), pages 1-16, July.
  • Handle: RePEc:plo:pdig00:0000308
    DOI: 10.1371/journal.pdig.0000308
    as

    Download full text from publisher

    File URL: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000308
    Download Restriction: no

    File URL: https://journals.plos.org/digitalhealth/article/file?id=10.1371/journal.pdig.0000308&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pdig.0000308?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Simone Fanelli & Lorenzo Pratici & Fiorella Pia Salvatore & Chiara Carolina Donelli & Antonello Zangrandi, 2022. "Big data analysis for decision-making processes: challenges and opportunities for the management of health-care organizations," Management Research Review, Emerald Group Publishing Limited, vol. 46(3), pages 369-389, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      More about this item

      Statistics

      Access and download statistics

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000308. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.