Back

Ensemble-based genomic prediction for maize flowering time reveals novel insights into trait genetic architecture and improves prediction for breeding applications

Tomura, S.; Powell, O. M.; Wilkinson, M. J.; Cooper, M.

2025-07-18 bioinformatics
10.1101/2025.07.15.664852 bioRxiv
Show abstract

While various genomic prediction models have been evaluated for their potential to accelerate genetic gain for multiple traits, no individual genomic prediction model has outperformed all others across all applications. As an alternative approach, ensembles of multiple individual genomic prediction models can be applied to utilise the complementary strengths of individual prediction models and offset the prediction errors of each. We used the EasiGP (Ensemble AnalySis with Interpretable Genomic Prediction) pipeline to investigate the performance of an ensemble approach, targeting flowering-time traits measured in two maize nested association mapping datasets. For both datasets, the ensemble-based prediction approach achieved higher prediction accuracy and lower prediction error across the flowering-time traits compared to each individual model. Multiple genomic regions known to contain key flowering-time related genes were repeatedly included as features across individual genomic prediction models, indicating the models successfully captured SNPs as features that are associated with genomic regions known to contain flowering-time genes. Although repeatability was high for some genomic regions, estimated marker effects varied across many genomic regions, suggesting that the models might also have captured different aspects of the genetic variation underlying the traits. The ensemble combination of the diverse views likely contributed to the improvement of prediction performance by the ensemble-based approach over the individual prediction models. Ensemble-based prediction can be applied to overcome limitations observed in the continuous exploration for the best individual genomic prediction models that can consistently achieve the highest prediction performance, thereby potentially contributing to improved prediction accuracy for applications in crop breeding. Article summaryThis study targets researchers interested in the performance of genomic prediction models. To demonstrate potential advantages of an ensemble of diverse individual genomic prediction models, we investigated the prediction of key flowering-time traits (days to anthesis and anthesis to silking interval) in two maize datasets. The ensemble approach consistently improved the prediction performance. The improvement was attributed to the offset of prediction errors by combining multiple different dimensions of trait genetic variation. Ensembles can lead to higher selection accuracy of desirable individuals for applications in crop breeding.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
The Plant Genome
53 papers in training set
Top 0.1%
54.8%
50% of probability mass above
2
in silico Plants
24 papers in training set
Top 0.1%
7.5%
3
Frontiers in Plant Science
240 papers in training set
Top 1%
6.7%
4
Frontiers in Genetics
197 papers in training set
Top 2%
3.8%
5
G3 Genes|Genomes|Genetics
351 papers in training set
Top 0.6%
3.8%
6
Horticulture Research
43 papers in training set
Top 0.8%
2.2%
7
Theoretical and Applied Genetics
46 papers in training set
Top 0.2%
1.8%
8
Bioinformatics
1061 papers in training set
Top 7%
1.6%
9
Crop Science
18 papers in training set
Top 0.2%
1.4%
10
GENETICS
189 papers in training set
Top 1.0%
1.0%
11
PLOS Computational Biology
1633 papers in training set
Top 21%
0.9%
12
Plant Direct
81 papers in training set
Top 2%
0.9%
13
Scientific Reports
3102 papers in training set
Top 70%
0.9%
14
Bioinformatics Advances
184 papers in training set
Top 4%
0.8%
15
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
16
Computational and Structural Biotechnology Journal
216 papers in training set
Top 9%
0.8%
17
Plant Communications
35 papers in training set
Top 1%
0.8%
18
BMC Genomics
328 papers in training set
Top 6%
0.7%
19
BMC Plant Biology
47 papers in training set
Top 1%
0.7%
20
G3: Genes, Genomes, Genetics
222 papers in training set
Top 1%
0.5%
21
Plant Physiology
217 papers in training set
Top 3%
0.5%
22
Heredity
53 papers in training set
Top 0.4%
0.5%