IDEAS home Printed from https://ideas.repec.org/a/gam/jagris/v11y2021i10p932-d644569.html
   My bibliography  Save this article

Development of a Genomic Prediction Pipeline for Maintaining Comparable Sample Sizes in Training and Testing Sets across Prediction Schemes Accounting for the Genotype-by-Environment Interaction

Author

Listed:
  • Reyna Persa

    (Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68588, USA)

  • Martin Grondona

    (Advanta Seeds, College Station, TX 77845, USA)

  • Diego Jarquin

    (Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68588, USA)

Abstract

The global growing population is experiencing challenges to satisfy the food chain supply in a world that faces rapid changes in environmental conditions complicating the development of stable cultivars. Emergent methodologies aided by molecular marker information such as marker assisted selection (MAS) and genomic selection (GS) have been widely adopted to assist the development of improved genotypes. In general, the implementation of GS is not straightforward, and it usually requires cross-validation studies to find the optimum set of factors (training set sizes, number of markers, quality control, etc.) to use in real breeding applications. In most cases, these different scenarios (combination of several factors) vary just in the levels of a single factor keeping fixed the levels of the other factors allowing the use of previously developed routines (code reuse). In this study, we present a set of structured modules that are easily to assemble for constructing complex genomic prediction pipelines from scratch. Also, we proposed a novel method for selecting training-testing sets of sizes across different cross-validation schemes (CV2, predicting tested genotypes in observed environments; CV1, predicting untested genotypes in observed environments; CV0, predicting tested genotypes in novel environments; and CV00, predicting untested genotypes in novel environments). To show how our implementation works, we considered two real data sets. These correspond to selected samples of the USDA soybean collection (D1: 324 genotypes observed in 6 environments scored for 9 traits) and of the Soybean Nested Association Mapping (SoyNAM) experiment (D2: 324 genotypes observed in 6 environments scored for 6 traits). In addition, three prediction models which consider the effect of environments and lines (M1: E + L), environments, lines and main effect of markers (M2: E + L + G), and also the inclusion of the interaction between makers and environments (M3: E + L + G + G×E) were considered. The results confirm that under CV2 and CV1 schemes, moderate improvements in predictive ability can be obtained with the inclusion of the interaction component, while for CV0 mixed results were observed, and for CV00 no improvements were shown. However, for this last scenario, the inclusion of weather and soil data potentially could enhance the results of the interaction model.

Suggested Citation

  • Reyna Persa & Martin Grondona & Diego Jarquin, 2021. "Development of a Genomic Prediction Pipeline for Maintaining Comparable Sample Sizes in Training and Testing Sets across Prediction Schemes Accounting for the Genotype-by-Environment Interaction," Agriculture, MDPI, vol. 11(10), pages 1-17, September.
  • Handle: RePEc:gam:jagris:v:11:y:2021:i:10:p:932-:d:644569
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2077-0472/11/10/932/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2077-0472/11/10/932/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Gilles Charmet & Louis-Gautier Tran & Jérôme Auzanneau & Renaud Rincent & Sophie Bouchet, 2020. "BWGS: A R package for genomic selection and its application to a wheat breeding programme," PLOS ONE, Public Library of Science, vol. 15(4), pages 1-20, April.
    2. Giovanny Covarrubias-Pazaran, 2016. "Genome-Assisted Prediction of Quantitative Traits Using the R Package sommer," PLOS ONE, Public Library of Science, vol. 11(6), pages 1-15, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Martina Hančová & Andrej Gajdoš & Jozef Hanč & Gabriela Vozáriková, 2021. "Estimating variances in time series kriging using convex optimization and empirical BLUPs," Statistical Papers, Springer, vol. 62(4), pages 1899-1938, August.
    2. Mathias Ruben Gemmer & Chris Richter & Yong Jiang & Thomas Schmutzer & Manish L Raorane & Björn Junker & Klaus Pillen & Andreas Maurer, 2020. "Can metabolic prediction be an alternative to genomic prediction in barley?," PLOS ONE, Public Library of Science, vol. 15(6), pages 1-15, June.
    3. Luciano Rogério Braatz de Andrade & Massaine Bandeira e Sousa & Eder Jorge Oliveira & Marcos Deon Vilela de Resende & Camila Ferreira Azevedo, 2019. "Cassava yield traits predicted by genomic selection methods," PLOS ONE, Public Library of Science, vol. 14(11), pages 1-22, November.
    4. Gaotian Zhang & Nicole M. Roberto & Daehan Lee & Steffen R. Hahnel & Erik C. Andersen, 2022. "The impact of species-wide gene expression variation on Caenorhabditis elegans complex traits," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    5. Md. S. Islam & Per McCord & Quentin D. Read & Lifang Qin & Alexander E. Lipka & Sushma Sood & James Todd & Marcus Olatoye, 2022. "Accuracy of Genomic Prediction of Yield and Sugar Traits in Saccharum spp. Hybrids," Agriculture, MDPI, vol. 12(9), pages 1-22, September.
    6. Takeshi Matsui & Martin N. Mullis & Kevin R. Roy & Joseph J. Hale & Rachel Schell & Sasha F. Levy & Ian M. Ehrenreich, 2022. "The interplay of additivity, dominance, and epistasis on fitness in a diploid yeast cross," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    7. Ibrahim ElBasyoni & Mohamed Saadalla & Stephen Baenziger & Harold Bockelman & Sabah Morsy, 2017. "Cell Membrane Stability and Association Mapping for Drought and Heat Tolerance in a Worldwide Wheat Collection," Sustainability, MDPI, vol. 9(9), pages 1-16, September.
    8. Mitchell J. Feldmann & Dominique D. A. Pincot & Glenn S. Cole & Steven J. Knapp, 2024. "Genetic gains underpinning a little-known strawberry Green Revolution," Nature Communications, Nature, vol. 15(1), pages 1-20, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jagris:v:11:y:2021:i:10:p:932-:d:644569. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.