Back

CLCNet: a contrastive learning and chromosome-aware network for genomic prediction in plants

Huang, J.; Yang, Z.; Yin, M.; Li, C.; Li, J.; Wang, Y.; Huang, L.; He, F.; Liang, C.; Li, M.; Han, R.; Jiang, Y.

2024-12-30 genomics
10.1101/2024.12.29.630569 bioRxiv
Show abstract

Genomic selection (GS) uses genome-wide markers and phenotypes to predict complex traits and breeding values. The effectiveness of GS critically depends on the accuracy of genomic prediction (GP) models. However, traditional GP frequently encounter difficulties in accurately capturing inter-individual variability and are often confronted with the challenge posed by the curse of dimensionality, with features like SNPs far exceeding sample sizes, thereby severely restricting their predictive performance. To address these challenges, we present CLCNet (Contrastive Learning and Chromosome-aware Network), a novel deep learning framework that integrates multi-task learning with contrastive learning for GP. CLCNet comprises two key components: (i) a contrastive learning module that enhances the models ability to capture fine-grained, genotype-dependent phenotypic differences among individuals, and (ii) a chromosome-aware module that performes structured feature selection at both chromosome and genome levels, retaining the most informative SNPs. CLCNet was evaluated across four major crop species, including maize (Zea mays), cotton (Gossypium hirsutum), rapeseed (Brassica napus), and soybean (Glycine max), covering ten agronomically important traits, and was compared with a diverse set of classical linear, machine learning, and deep learning models. Across most traits, CLCNet achieved top prediction performance, with statistically significant improvements in Pearson correlation coefficient (PCC), typically ranging from 0.3% to 6.5% over strong baseline models, together with reduced mean squared error (MSE). Notably, the advantages of CLCNet were pronounced for traits in maize, rapeseed, and soybean, while for cotton traits largely governed by additive genetic effects, its performance remained stable and did not show any decline. Overall, these results demonstrated that CLCNet provided a robust and effective framework for improving genomic prediction accuracy and holds substantial potential for accelerating genetic gain in plant breeding.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
New Phytologist
309 papers in training set
Top 0.5%
9.9%
2
Plant Phenomics
17 papers in training set
Top 0.1%
7.1%
3
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.1%
6.7%
4
Bioinformatics Advances
184 papers in training set
Top 0.5%
6.3%
5
Frontiers in Plant Science
240 papers in training set
Top 2%
6.2%
6
The Plant Genome
53 papers in training set
Top 0.1%
4.8%
7
Frontiers in Genetics
197 papers in training set
Top 1%
4.8%
8
Horticulture Research
43 papers in training set
Top 0.4%
4.3%
50% of probability mass above
9
Bioinformatics
1061 papers in training set
Top 6%
3.2%
10
in silico Plants
24 papers in training set
Top 0.1%
3.0%
11
Scientific Reports
3102 papers in training set
Top 43%
2.8%
12
BMC Genomics
328 papers in training set
Top 1%
2.7%
13
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.4%
14
Nature Communications
4913 papers in training set
Top 47%
2.0%
15
PLOS ONE
4510 papers in training set
Top 51%
1.9%
16
Computational and Structural Biotechnology Journal
216 papers in training set
Top 4%
1.9%
17
Plant Communications
35 papers in training set
Top 0.8%
1.7%
18
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
19
Cell Genomics
162 papers in training set
Top 3%
1.7%
20
Nature Machine Intelligence
61 papers in training set
Top 2%
1.7%
21
Theoretical and Applied Genetics
46 papers in training set
Top 0.3%
1.2%
22
The Plant Journal
197 papers in training set
Top 3%
1.2%
23
Communications Biology
886 papers in training set
Top 15%
1.2%
24
PLOS Computational Biology
1633 papers in training set
Top 22%
0.9%
25
GigaScience
172 papers in training set
Top 3%
0.8%
26
iScience
1063 papers in training set
Top 33%
0.7%
27
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 45%
0.7%
28
PLOS Genetics
756 papers in training set
Top 15%
0.7%
29
Plant Biotechnology Journal
56 papers in training set
Top 1%
0.7%
30
Methods in Ecology and Evolution
160 papers in training set
Top 3%
0.6%