IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0342490.html

Using machine learning to predict and analyze complex trait diseases: Lessons from a simple abstract model

Author

Listed:
  • Eden Maimon
  • Ori Bondi
  • John Moult
  • Ron Unger

Abstract

The ability to predict individual genetic susceptibility to a complex trait disease is a major challenge in modern medicine. One approach to addressing this challenge utilizes an additive combination of contributions from a large number of single nucleotide polymorphisms (SNPs), with weights derived from Genome Wide Association Studies (GWAS). While this approach is somewhat successful in predicting whether an individual is likely to develop a specific disease, it does not explain why a person is likely to become sick. Here, we designed and utilized abstract disease models to investigate the relationship between disease structure, susceptibility, and predictability. The model consists of a set of interacting pathways, each including several nodes representing loci at which genetic variants can alter the function of the corresponding proteins. Due to the introduction of thresholds for pathway functionality, and the interplay between the pathways, this model is inherently non-additive. We use this “toy model” together with simulated variant data to examine the effect of changing various properties, some of which cannot be easily controlled in a “real-world” scenario. As expected, larger sample sizes improved the performance; the omission of some contributing variants from the dataset was associated with a significant decrease in performance, whereas adding irrelevant variants had little effect. Surprisingly, diseases with a more complex underlying structure were better predicted than those with a simpler structure. In addition, risk prediction was more accurate for diseases with lower prevalence. The algorithm was robust to a reasonable percentage of false negative disease assignments. The largest decrease in performance occurred when two diseases with different genetic etiologies were classified as a single pathology, as often occurs in clinical situations, and apparently confuses the neural network algorithm. Finally, we show that a post-analysis of a neural network using t-SNE can provide biological insights into the underlying disease structure.

Suggested Citation

  • Eden Maimon & Ori Bondi & John Moult & Ron Unger, 2026. "Using machine learning to predict and analyze complex trait diseases: Lessons from a simple abstract model," PLOS ONE, Public Library of Science, vol. 21(2), pages 1-23, February.
  • Handle: RePEc:plo:pone00:0342490
    DOI: 10.1371/journal.pone.0342490
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0342490
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0342490&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0342490?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Hannah Wand & Samuel A. Lambert & Cecelia Tamburro & Michael A. Iacocca & Jack W. O’Sullivan & Catherine Sillari & Iftikhar J. Kullo & Robb Rowley & Jacqueline S. Dron & Deanna Brockman & Eric Venner , 2021. "Improving reporting standards for polygenic scores in risk prediction studies," Nature, Nature, vol. 591(7849), pages 211-219, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Carles Foguet & Yu Xu & Scott C. Ritchie & Samuel A. Lambert & Elodie Persyn & Artika P. Nath & Emma E. Davenport & David J. Roberts & Dirk S. Paul & Emanuele Angelantonio & John Danesh & Adam S. Butt, 2022. "Genetically personalised organ-specific metabolic models in health and disease," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    2. Zonghuang Xu & Jin Shi, 2025. "Research on the national security risk assessment model: a case study of political security in China," Humanities and Social Sciences Communications, Palgrave Macmillan, vol. 12(1), pages 1-13, December.
    3. Atlas Khan & Ning Shang & Jordan G. Nestor & Chunhua Weng & George Hripcsak & Peter C. Harris & Ali G. Gharavi & Krzysztof Kiryluk, 2023. "Polygenic risk alters the penetrance of monogenic kidney disease," Nature Communications, Nature, vol. 14(1), pages 1-10, December.
    4. Jiacheng Miao & Hanmin Guo & Gefei Song & Zijie Zhao & Lin Hou & Qiongshi Lu, 2023. "Quantifying portable genetic effects and improving cross-ancestry genetic prediction with GWAS summary statistics," Nature Communications, Nature, vol. 14(1), pages 1-13, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0342490. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.