IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0308962.html
   My bibliography  Save this article

Comparison of machine learning methods for genomic prediction of selected Arabidopsis thaliana traits

Author

Listed:
  • Ciaran Michael Kelly
  • Russell Lewis McLaughlin

Abstract

We present a comparison of machine learning methods for the prediction of four quantitative traits in Arabidopsis thaliana. High prediction accuracies were achieved on individuals grown under standardized laboratory conditions from the 1001 Arabidopsis Genomes Project. An existing body of evidence suggests that linear models may be impeded by their inability to make use of non-additive effects to explain phenotypic variation at the population level. The results presented here use a nested cross-validation approach to confirm that some machine learning methods have the ability to statistically outperform linear prediction models, with the optimal model dependent on availability of training data and genetic architecture of the trait in question. Linear models were competitive in their performance as per previous work, though the neural network class of predictors was observed to be the most accurate and robust for traits with high heritability. The extent to which non-linear models exploit interaction effects will require further investigation of the causal pathways that lay behind their predictions. Future work utilizing more traits and larger sample sizes, combined with an improved understanding of their respective genetic architectures, may lead to improvements in prediction accuracy.

Suggested Citation

  • Ciaran Michael Kelly & Russell Lewis McLaughlin, 2024. "Comparison of machine learning methods for genomic prediction of selected Arabidopsis thaliana traits," PLOS ONE, Public Library of Science, vol. 19(8), pages 1-13, August.
  • Handle: RePEc:plo:pone00:0308962
    DOI: 10.1371/journal.pone.0308962
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0308962
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0308962&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0308962?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Andrius Vabalas & Emma Gowen & Ellen Poliakoff & Alexander J Casson, 2019. "Machine learning algorithm validation with a limited sample size," PLOS ONE, Public Library of Science, vol. 14(11), pages 1-20, November.
    2. Konietschke, Frank & Placzek, Marius & Schaarschmidt, Frank & Hothorn, Ludwig A., 2015. "nparcomp: An R Software Package for Nonparametric Multiple Comparisons and Simultaneous Confidence Intervals," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 64(i09).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Li-Dunn Chen & Michael A Caprio & Devin M Chen & Andrew J Kouba & Carrie K Kouba, 2024. "Enhancing predictive performance for spectroscopic studies in wildlife science through a multi-model approach: A case study for species classification of live amphibians," PLOS Computational Biology, Public Library of Science, vol. 20(2), pages 1-24, February.
    2. Ephrem Habyarimana & Faheem S Baloch, 2021. "Machine learning models based on remote and proximal sensing as potential methods for in-season biomass yields prediction in commercial sorghum fields," PLOS ONE, Public Library of Science, vol. 16(3), pages 1-23, March.
    3. Juliette Richetin & Giulio Costantini & Marco Perugini & Felix Schönbrodt, 2015. "Should We Stop Looking for a Better Scoring Algorithm for Handling Implicit Association Test Data? Test of the Role of Errors, Extreme Latencies Treatment, Scoring Formula, and Practice Trials on Reli," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-23, June.
    4. Leandro C. Hermida & E. Michael Gertz & Eytan Ruppin, 2022. "Predicting cancer prognosis and drug response from the tumor microbiome," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    5. Jonathan C. M. Wan & Dennis Stephens & Lingqi Luo & James R. White & Caitlin M. Stewart & Benoît Rousseau & Dana W. Y. Tsui & Luis A. Diaz, 2022. "Genome-wide mutational signatures in low-coverage whole genome sequencing of cell-free DNA," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    6. Steffen Steinert & Verena Ruf & David Dzsotjan & Nicolas Großmann & Albrecht Schmidt & Jochen Kuhn & Stefan Küchemann, 2024. "A refined approach for evaluating small datasets via binary classification using machine learning," PLOS ONE, Public Library of Science, vol. 19(5), pages 1-21, May.
    7. Celia M. Gagliardi & Marc E. Normandin & Alexandra T. Keinath & Joshua B. Julian & Matthew R. Lopez & Manuel-Miguel Ramos-Alvarez & Russell A. Epstein & Isabel A. Muzzio, 2024. "Distinct neural mechanisms for heading retrieval and context recognition in the hippocampus during spatial reorientation," Nature Communications, Nature, vol. 15(1), pages 1-22, December.
    8. Jacob Beck, 2023. "Quality aspects of annotated data," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 17(3), pages 331-353, December.
    9. Sinha, Shruti & Sankar Rao, Chinta & Kumar, Abhishankar & Venkata Surya, Dadi & Basak, Tanmay, 2024. "Exploring and understanding the microwave-assisted pyrolysis of waste lignocellulose biomass using gradient boosting regression machine learning model," Renewable Energy, Elsevier, vol. 231(C).
    10. Pizarro, E. & Galleguillos, M. & Barría, P. & Callejas, R., 2022. "Irrigation management or climate change ? Which is more important to cope with water shortage in the production of table grape in a Mediterranean context," Agricultural Water Management, Elsevier, vol. 263(C).
    11. Reza Rezaee & Jamiu Ekundayo, 2022. "Permeability Prediction Using Machine Learning Methods for the CO 2 Injectivity of the Precipice Sandstone in Surat Basin, Australia," Energies, MDPI, vol. 15(6), pages 1-15, March.
    12. Nica-Avram, Georgiana & Harvey, John & Smith, Gavin & Smith, Andrew & Goulding, James, 2021. "Identifying food insecurity in food sharing networks via machine learning," Journal of Business Research, Elsevier, vol. 131(C), pages 469-484.
    13. Kristof Lommers & Ouns El Harzli & Jack Kim, 2021. "Confronting Machine Learning With Financial Research," Papers 2103.00366, arXiv.org, revised Mar 2021.
    14. Claudia Kedor & Helma Freitag & Lil Meyer-Arndt & Kirsten Wittke & Leif G. Hanitsch & Thomas Zoller & Fridolin Steinbeis & Milan Haffke & Gordon Rudolf & Bettina Heidecker & Thomas Bobbert & Joachim S, 2022. "A prospective observational study of post-COVID-19 chronic fatigue syndrome following the first pandemic wave in Germany and biomarkers associated with symptom severity," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    15. Carlo Dindorf & Eva Bartaguiz & Freya Gassmann & Michael Fröhlich, 2022. "Conceptual Structure and Current Trends in Artificial Intelligence, Machine Learning, and Deep Learning Research in Sports: A Bibliometric Review," IJERPH, MDPI, vol. 20(1), pages 1-23, December.
    16. Zhou, Huanyu & Qiu, Yingning & Feng, Yanhui & Liu, Jing, 2022. "Power prediction of wind turbine in the wake using hybrid physical process and machine learning models," Renewable Energy, Elsevier, vol. 198(C), pages 568-586.
    17. Bhattacharjee, Biplab & Kumar, Rajiv & Senthilkumar, Arunachalam, 2022. "Unidirectional and bidirectional LSTM models for edge weight predictions in dynamic cross-market equity networks," International Review of Financial Analysis, Elsevier, vol. 84(C).
    18. Halinski, Rosana & Garibaldi, Lucas Alejandro & dos Santos, Charles Fernando & Acosta, André Luis & Guidi, Daniel Dornelles & Blochtein, Betina, 2020. "Forest fragments and natural vegetation patches within crop fields contribute to higher oilseed rape yields in Brazil," Agricultural Systems, Elsevier, vol. 180(C).
    19. Mahdi Goldani & Soraya Asadi Tirvan, 2024. "Sensitivity Assessing to Data Volume for forecasting: introducing similarity methods as a suitable one in Feature selection methods," Papers 2406.04390, arXiv.org.
    20. Xiaofeng Xu & Zhaoyuan Chen & Shixiang Chen, 2023. "Enhancing economic competitiveness analysis through machine learning: Exploring complex urban features," PLOS ONE, Public Library of Science, vol. 18(11), pages 1-27, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0308962. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.