Back

Novel linkage disequilibrium-based genotype-by-environmental interaction method for genomic prediction of cotton yield and fibre quality traits

Li, Z.; Li, X.; Liu, S.; Wilson, I.; Zhu, Q.-H.; Stiller, W.; Conaty, W.

2026-05-06 plant biology
10.64898/2026.05.03.722538 bioRxiv
Show abstract

Genomic prediction (GP) across diverse environments has a potential to accelerate genetic gain in cotton breeding programs. A major challenge in GP is modelling genotype-by-environment interactions (GEI), which is essential for selecting stable and high-performing genotypes under variable production conditions. However, incorporating GEI into GP models increases the dimensionality and computational complexity, risking complex models that are impractical to use on commercial breeding-scale data sets because of run times and computational demands. This study addresses two primary aims. Firstly, we evaluate the practical benefits of GEI-informed GP for predicting economically important cotton traits. Second, advanced statistical modelling strategies are developed and assessed for integrating genomic and environmental data at scale. We propose a dimensionality reduction approach that combines linkage disequilibrium network analysis with principal component techniques to reduce redundancy while preserving informative variation. Using this reduced dataset, we implement Bayesian linear regression models and, for comparison, deep residual neural networks for genomic prediction. Analyses were conducted on a large multi-environment dataset from the CSIRO cotton breeding program, comprising 3,236 breeding lines, 54 environmental covariates, and 8,049 yield and fibre quality phenotype records collected over 10 years and 9 locations representing 41 year-location combinations. Results demonstrate that generally Bayesian linear regression approaches outperform BG-BLUP models, with all three linear/linear mixed methods providing clearly more reliable performance than the deep learning models. These findings highlight the value of using interpretable statistical models for integrating genomic and environmental information to support selection decisions under diverse environmental conditions.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Frontiers in Plant Science
240 papers in training set
Top 0.9%
10.1%
2
PLOS ONE
4510 papers in training set
Top 19%
10.1%
3
Scientific Reports
3102 papers in training set
Top 11%
8.2%
4
Theoretical and Applied Genetics
46 papers in training set
Top 0.1%
7.2%
5
Frontiers in Genetics
197 papers in training set
Top 0.9%
6.3%
6
Crop Science
18 papers in training set
Top 0.1%
4.8%
7
New Phytologist
309 papers in training set
Top 1%
4.8%
50% of probability mass above
8
BMC Genomics
328 papers in training set
Top 0.6%
4.0%
9
The Plant Genome
53 papers in training set
Top 0.2%
4.0%
10
Plant Phenomics
17 papers in training set
Top 0.1%
3.6%
11
G3 Genes|Genomes|Genetics
351 papers in training set
Top 0.6%
3.6%
12
The Plant Phenome Journal
14 papers in training set
Top 0.1%
3.6%
13
Biology Methods and Protocols
53 papers in training set
Top 0.7%
1.9%
14
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.4%
1.7%
15
Methods in Ecology and Evolution
160 papers in training set
Top 2%
1.5%
16
The Plant Journal
197 papers in training set
Top 3%
1.2%
17
International Journal of Molecular Sciences
453 papers in training set
Top 10%
1.2%
18
GigaScience
172 papers in training set
Top 2%
1.2%
19
Plant Direct
81 papers in training set
Top 2%
0.9%
20
Plant Methods
39 papers in training set
Top 0.6%
0.9%
21
G3
33 papers in training set
Top 0.4%
0.9%
22
Phytopathology®
28 papers in training set
Top 0.5%
0.9%
23
Computational and Structural Biotechnology Journal
216 papers in training set
Top 9%
0.7%
24
PLANTS, PEOPLE, PLANET
21 papers in training set
Top 0.8%
0.7%
25
Journal of Computational Biology
37 papers in training set
Top 0.7%
0.6%
26
PeerJ
261 papers in training set
Top 17%
0.6%
27
Scientific Data
174 papers in training set
Top 3%
0.6%
28
Bioinformatics Advances
184 papers in training set
Top 5%
0.6%
29
PLOS Computational Biology
1633 papers in training set
Top 27%
0.6%