IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v19y2022i21p13890-d953503.html
   My bibliography  Save this article

Decoding Diabetes Biomarkers and Related Molecular Mechanisms by Using Machine Learning, Text Mining, and Gene Expression Analysis

Author

Listed:
  • Amira M. Elsherbini

    (Department of Oral Biology, Faculty of Dentistry, Mansoura University, Mansoura 35116, Egypt)

  • Alsamman M. Alsamman

    (Agricultural Genetic Engineering Research Institute, Agricultural Research Center, Giza 12619, Egypt)

  • Nehal M. Elsherbiny

    (Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Tabuk, Tabuk 71491, Saudi Arabia
    Department of Biochemistry, Faculty of Pharmacy, Mansoura University, Mansoura 35116, Egypt)

  • Mohamed El-Sherbiny

    (Department of Basic Medical Sciences, College of Medicine, AlMaarefa University, Riyadh 71666, Saudi Arabia
    Department of Anatomy, Mansoura Faculty of Medicine, Mansoura University, Mansoura 35116, Egypt)

  • Rehab Ahmed

    (Department of Natural Products and Alternative Medicine, Faculty of Pharmacy, University of Tabuk, Tabuk 71491, Saudi Arabia
    Department of Pharmaceutics, Faculty of Pharmacy, University of Khartoum, Khartoum 11111, Sudan)

  • Hasnaa Ali Ebrahim

    (Department of Basic Medical Sciences, College of Medicine, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia)

  • Joaira Bakkach

    (Biomedical Genomics and Oncogenetics Research Laboratory, Faculty of Sciences and Techniques of Tangier, Abdelmalek Essaâdi University Morocco, Tétouan 93000, Morocco)

Abstract

The molecular basis of diabetes mellitus is yet to be fully elucidated. We aimed to identify the most frequently reported and differential expressed genes (DEGs) in diabetes by using bioinformatics approaches. Text mining was used to screen 40,225 article abstracts from diabetes literature. These studies highlighted 5939 diabetes-related genes spread across 22 human chromosomes, with 112 genes mentioned in more than 50 studies. Among these genes, HNF4A , PPARA , VEGFA , TCF7L2 , HLA-DRB1 , PPARG , NOS3 , KCNJ11 , PRKAA2 , and HNF1A were mentioned in more than 200 articles. These genes are correlated with the regulation of glycogen and polysaccharide, adipogenesis, AGE/RAGE, and macrophage differentiation. Three datasets (44 patients and 57 controls) were subjected to gene expression analysis. The analysis revealed 135 significant DEGs, of which CEACAM6 , ENPP4 , HDAC5 , HPCAL1 , PARVG , STYXL1 , VPS28 , ZBTB33 , ZFP37 and CCDC58 were the top 10 DEGs. These genes were enriched in aerobic respiration, T-cell antigen receptor pathway, tricarboxylic acid metabolic process, vitamin D receptor pathway, toll-like receptor signaling, and endoplasmic reticulum (ER) unfolded protein response. The results of text mining and gene expression analyses used as attribute values for machine learning (ML) analysis. The decision tree, extra-tree regressor and random forest algorithms were used in ML analysis to identify unique markers that could be used as diabetes diagnosis tools. These algorithms produced prediction models with accuracy ranges from 0.6364 to 0.88 and overall confidence interval (CI) of 95%. There were 39 biomarkers that could distinguish diabetic and non-diabetic patients, 12 of which were repeated multiple times. The majority of these genes are associated with stress response, signalling regulation, locomotion, cell motility, growth, and muscle adaptation. Machine learning algorithms highlighted the use of the HLA-DQB1 gene as a biomarker for diabetes early detection. Our data mining and gene expression analysis have provided useful information about potential biomarkers in diabetes.

Suggested Citation

  • Amira M. Elsherbini & Alsamman M. Alsamman & Nehal M. Elsherbiny & Mohamed El-Sherbiny & Rehab Ahmed & Hasnaa Ali Ebrahim & Joaira Bakkach, 2022. "Decoding Diabetes Biomarkers and Related Molecular Mechanisms by Using Machine Learning, Text Mining, and Gene Expression Analysis," IJERPH, MDPI, vol. 19(21), pages 1-18, October.
  • Handle: RePEc:gam:jijerp:v:19:y:2022:i:21:p:13890-:d:953503
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/19/21/13890/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/19/21/13890/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Paul Zimmet & K. G. M. M. Alberti & Jonathan Shaw, 2001. "Global and societal implications of the diabetes epidemic," Nature, Nature, vol. 414(6865), pages 782-787, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Angel Denche-Zamorano & Jorge Perez-Gomez & Sabina Barrios-Fernandez & Rafael Oliveira & Jose C. Adsuar & João Paulo Brito, 2023. "Relationships between Physical Activity Frequency and Self-Perceived Health, Self-Reported Depression, and Depressive Symptoms in Spanish Older Adults with Diabetes: A Cross-Sectional Study," IJERPH, MDPI, vol. 20(4), pages 1-17, February.
    2. Hui-Ju Tsai & Chia-Ying Li & Wen-Chi Pan & Tsung-Chieh Yao & Huey-Jen Su & Chih-Da Wu & Yinq-Rong Chern & John D. Spengler, 2020. "The Effect of Surrounding Greenness on Type 2 Diabetes Mellitus: A Nationwide Population-Based Cohort in Taiwan," IJERPH, MDPI, vol. 18(1), pages 1-11, December.
    3. Samuel Ojima Adejoh, 2014. "Diabetes Knowledge, Health Belief, and Diabetes Management Among the Igala, Nigeria," SAGE Open, , vol. 4(2), pages 21582440145, June.
    4. Liu Xu & Gao Bin & Cui Yuehua, 2017. "Generalized partial linear varying multi-index coefficient model for gene-environment interactions," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(1), pages 59-74, March.
    5. Cannuscio, Carolyn C. & Hillier, Amy & Karpyn, Allison & Glanz, Karen, 2014. "The social dynamics of healthy food shopping and store choice in an urban environment," Social Science & Medicine, Elsevier, vol. 122(C), pages 13-20.
    6. Renat Sergazinov & Andrew Leroux & Erjia Cui & Ciprian Crainiceanu & R. Nisha Aurora & Naresh M. Punjabi & Irina Gaynanova, 2023. "A case study of glucose levels during sleep using multilevel fast function on scalar regression inference," Biometrics, The International Biometric Society, vol. 79(4), pages 3873-3882, December.
    7. Hajah Norhakimah Haji Mohd Nor & Masitah Shahrill, 2014. "Using a Case-Control Genotypic Testing in Investigating the Association with Type-2 Diabetes," Modern Applied Science, Canadian Center of Science and Education, vol. 8(6), pages 1-1, December.
    8. Yuan Xue & Xiao-Yan Zhang & Hui-Juan Zhou & Omorogieva Ojo & Qi Wang & Li-Li Wang & Qing Jiang & Ting Liu & Xiao-Hua Wang, 2020. "Associations Between the Knowledge of Different Food Categories and Glycemia in Chinese Adult Patients With Type 2 Diabetes," Clinical Nursing Research, , vol. 29(5), pages 313-321, June.
    9. Mary Carolan & Jessica Holman & Michelle Ferrari, 2015. "Experiences of diabetes self‐management: a focus group study among Australians with type 2 diabetes," Journal of Clinical Nursing, John Wiley & Sons, vol. 24(7-8), pages 1011-1023, April.
    10. Gladness Nteboheng Lion & Joshua Oluwole Olowoyo, 2023. "Possible Sources of Trace Metals in Obese Females Living in Informal Settlements near Industrial Sites around Gauteng, South Africa," IJERPH, MDPI, vol. 20(6), pages 1-13, March.
    11. Mohammed Abdullah Al Mansour, 2019. "The Prevalence and Risk Factors of Type 2 Diabetes Mellitus (DMT2) in a Semi-Urban Saudi Population," IJERPH, MDPI, vol. 17(1), pages 1-8, December.
    12. Nour Yassin, 2022. "Adherence to Treatment in Diabetic Patients in Lebanon," Technium Social Sciences Journal, Technium Science, vol. 37(1), pages 375-395, November.
    13. Tunku Salha, T.A. & O’Neill, C. & Rowan, N.J., 2013. "The Use of Cointegration and Error Correction Modelling To Investigate the Influence of Diabetes and Associated Medical Services Expenditure on Economic Growth in Malaysia," Journal of Asian Scientific Research, Asian Economic and Social Society, vol. 3(6), pages 644-653, June.
    14. Haiying Gong & Lize Pa & Ke Wang & Hebuli Mu & Fen Dong & Shengjiang Ya & Guodong Xu & Ning Tao & Li Pan & Bin Wang & Guangliang Shan, 2015. "Prevalence of Diabetes and Associated Factors in the Uyghur and Han Population in Xinjiang, China," IJERPH, MDPI, vol. 12(10), pages 1-11, October.
    15. repec:thr:techub:10037:y:2022:i:1:p:375-395 is not listed on IDEAS
    16. Baum, Peter, 2011. "A new track for technology: Can ICT take care for healthier lifestyles?," 22nd European Regional ITS Conference, Budapest 2011: Innovative ICT Applications - Emerging Regulatory, Economic and Policy Issues 52185, International Telecommunications Society (ITS).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:19:y:2022:i:21:p:13890-:d:953503. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.