Author
Listed:
- Noah Klimkowski Arango
- Fabio Morgante
Abstract
Accurate prediction of complex traits is an important task in quantitative genetics. Genotypes have been used for trait prediction using a variety of methods such as mixed models, Bayesian methods, penalized regression methods, dimension reduction methods, and machine learning methods. Recent studies have shown that gene expression levels can produce higher prediction accuracy than genotypes. However, only a few prediction methods were tested in these studies. Thus, a comprehensive assessment of methods is needed to fully evaluate the potential of gene expression as a predictor of complex trait phenotypes. Here, we used data from the Drosophila Genetic Reference Panel (DGRP) to compare the ability of several existing statistical learning methods to predict starvation resistance and startle response from gene expression in the two sexes separately. The methods considered differ in assumptions about the distribution of gene effects—ranging from models that assume that every gene affects the trait to more sparse models—and their ability to capture gene-gene interactions. We also used functional annotation (i.e., Gene Ontology (GO)) as a source of biological information to inform prediction models. The results show that differences in prediction accuracy exist. For example, methods performing variable selection achieved higher prediction accuracy for starvation resistance in females, while they generally had lower accuracy for startle response in both sexes. Incorporating GO annotations further improved prediction accuracy for a few GO terms of biological significance. Biological significance extended to the genes underlying highly predictive GO terms. Notably, the Insulin-like Receptor (InR) was prevalent across methods and sexes for starvation resistance. For startle response, crumbs (crb) and imaginal disc growth factor 2 (Idgf2) were found for females and males, respectively. Our results confirmed the potential of transcriptomic prediction and highlighted the importance of selecting appropriate methods and strategies in order to achieve accurate predictions.
Suggested Citation
Noah Klimkowski Arango & Fabio Morgante, 2025.
"Comparing statistical learning methods for complex trait prediction from gene expression,"
PLOS ONE, Public Library of Science, vol. 20(2), pages 1-20, February.
Handle:
RePEc:plo:pone00:0317516
DOI: 10.1371/journal.pone.0317516
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0317516. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.