Author
Listed:
- Amir Morshedian
- Mike Domaratzki
Abstract
This paper presents a deep-learning framework that combines an LSTM, a graph neural network (GNN), and transformer-style attention to model genotype–environment (G×E) effects for maize yield prediction. Weather data for a growing season is summarized using LSTM and encoded into a 21-dimensional embedding that is used as the environment node feature; 437,214 SNPs are summarized into 548 principal components that instantiate genotype nodes. Multi-head attention dynamically weights the edges during message passing. Three architectures are compared: A (fully bipartite graph), B (A with intra-set top-k similarity within genotype and within environment), and C (B with a single learnable supernode readout that attends over all nodes after message passing). The joint representations feed a compact MLP for yield prediction. Using a forward-time split (2014–2021 train; 2022 test with unseen genotypes and unseen environments), performance improves monotonically from A to C: A (RMSE 2.7749, PCC 0.4115, R2 0.1693), B (2.3683, 0.6622, 0.4385), C (2.2120, 0.6945, 0.4823). Compared to A, C has a reduction in RMSE by 0.5629 (∼20.3%) and an increase in PCC by 0.283 (∼68.8%), indicating that global, content-adaptive aggregation promotes local G×E propagation. Performance of proposed approach remains consistent regardless of the number of genotypes per environment and has strong performance under variable or unbalanced genotype sampling expression across environments. The proposed approach is compared with methods from the Global G×E Prediction Competition and show that two of three architectures improve predictive performance, with the best architecture achieving a lower RMSE (2.2120) and a higher Pearson correlation (0.6945) than the competition-winning model.Author summary: This paper considers the relationship between plant genomics and environmental effects and its effect on yield. By studying a maize dataset that combines nearly 5,000 varieties of the crop in 280 location-year combinations, we make predictions on the yield of a variety when grown in a particular environment. Environmental data that is directly used in the prediction includes solar radiation, temperature, wind speed and precipitation.
Suggested Citation
Amir Morshedian & Mike Domaratzki, 2026.
"LSTM-attention-guided graph neural networks for integrated genotype–Environment modeling in maize yield prediction,"
PLOS Computational Biology, Public Library of Science, vol. 22(5), pages 1-20, May.
Handle:
RePEc:plo:pcbi00:1013729
DOI: 10.1371/journal.pcbi.1013729
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1013729. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.