Back

Predicting nonlinear genetic relationships between traits in multi-trait evaluations by using a GBLUP-assisted Deep Learning model

Shokor, F.; Croiseau, P.; Gangloff, H.; Saintilan, R.; Tribout, T.; Mary-Huard, T.; C.D Cuyabano, B.

2024-03-27 genomics
10.1101/2024.03.23.585208 bioRxiv
Show abstract

BackgroundGenomic prediction aims to predict the breeding values of multiple complex traits, usually assumed to be normally distributed by the largely used statistical methods, thus imposing linear genetic correlations between traits. While statistical methods are of great value for genomic prediction, these methods do not account for nonlinear genetic relationships between traits. If such relationships exist, although statistical models do perform a fair linear approximation, their prediction accuracy is limited due to the nonlinearity. Deep learning (DL) is a promising methodology for predicting multiple complex traits, in scenarios where nonlinear genetic relationships are present, due to its capacity to capture complex and nonlinear patterns in large data. We proposed a novel hybrid DLGBLUP model which uses the output of the traditional GBLUP, and enhances its PGV by accounting for nonlinear genetic relationships between traits using DL. Using simulated data, we compared the accuracy of the PGV obtained with the proposed hybrid DLGBLUP model, a DL model, and the traditional GBLUP model - the latter being our baseline reference. ResultsWe found that both DL and DLGBLUP models either outperformed GBLUP, or presented equally accurate PGV, with a particular greater accuracy for traits presenting a strongly characterized nonlinear genetic relationship. Overall, DLGBLUP presented the highest prediction accuracy, up to 0.2 points higher than GBLUP, and smallest mean squared error of the PGV for all traits. Additionally, we evolved a base population over seven generations and compared the genetic progress when selecting individuals based on the additive PGV obtained by either DL, DLGBLUP or GBLUP. For all traits with a nonlinear genetic relationship, after the fourth generation, the observed genetic gain when selection was based on the additive PGV from GBLUP was always inferior to the one achieved from either DL or DLGBLUP. ConclusionsThe integration of DL into genomic prediction enables the possibility of modeling nonlinear relationships between traits. Moreover, by identifying these nonlinear genetic relationships, our DL and DLGBLUP models improved prediction accuracy, when compared to GBLUP. The possibility of nonlinear relationships between traits offers a different perspective into multi-trait evaluations and prediction, as well as into the traits evolution over generations, with potential to further improve selection strategies in commercial livestock breeding programs. Moreover, DLGBLUP shows that DL can be used as a complement to statistical methods, by enhancing their performance.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Genetics Selection Evolution
33 papers in training set
Top 0.1%
40.3%
2
BMC Genomics
328 papers in training set
Top 0.2%
7.3%
3
Scientific Reports
3102 papers in training set
Top 22%
5.0%
50% of probability mass above
4
Frontiers in Genetics
197 papers in training set
Top 1%
4.4%
5
PLOS ONE
4510 papers in training set
Top 34%
4.3%
6
Aquaculture
29 papers in training set
Top 0.2%
3.7%
7
Methods in Ecology and Evolution
160 papers in training set
Top 0.9%
3.1%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
2.8%
9
GigaScience
172 papers in training set
Top 0.9%
2.1%
10
Bioinformatics Advances
184 papers in training set
Top 3%
1.8%
11
Journal of Dairy Science
11 papers in training set
Top 0.1%
1.7%
12
Biology Methods and Protocols
53 papers in training set
Top 0.8%
1.7%
13
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
14
Gigabyte
60 papers in training set
Top 0.8%
1.5%
15
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
1.4%
16
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.6%
1.3%
17
Frontiers in Plant Science
240 papers in training set
Top 4%
1.3%
18
BMC Bioinformatics
383 papers in training set
Top 6%
1.0%
19
Bioinformatics
1061 papers in training set
Top 9%
0.9%
20
Molecular Ecology Resources
161 papers in training set
Top 1.0%
0.8%
21
The Plant Genome
53 papers in training set
Top 0.6%
0.8%
22
Biology
43 papers in training set
Top 3%
0.7%
23
Genes
126 papers in training set
Top 4%
0.7%
24
NAR Genomics and Bioinformatics
214 papers in training set
Top 5%
0.5%
25
Poultry Science
10 papers in training set
Top 0.3%
0.5%