CLCNet: a contrastive learning and chromosome-aware network for genomic prediction in plants

Huang, J.; Yang, Z.; Yin, M.; Li, C.; Li, J.; Wang, Y.; Huang, L.; He, F.; Liang, C.; Li, M.; Han, R.; Jiang, Y.

2024-12-30 genomics

10.1101/2024.12.29.630569 bioRxiv

Show abstract

Genomic selection (GS) uses genome-wide markers and phenotypes to predict complex traits and breeding values. The effectiveness of GS critically depends on the accuracy of genomic prediction (GP) models. However, traditional GP frequently encounter difficulties in accurately capturing inter-individual variability and are often confronted with the challenge posed by the curse of dimensionality, with features like SNPs far exceeding sample sizes, thereby severely restricting their predictive performance. To address these challenges, we present CLCNet (Contrastive Learning and Chromosome-aware Network), a novel deep learning framework that integrates multi-task learning with contrastive learning for GP. CLCNet comprises two key components: (i) a contrastive learning module that enhances the models ability to capture fine-grained, genotype-dependent phenotypic differences among individuals, and (ii) a chromosome-aware module that performes structured feature selection at both chromosome and genome levels, retaining the most informative SNPs. CLCNet was evaluated across four major crop species, including maize (Zea mays), cotton (Gossypium hirsutum), rapeseed (Brassica napus), and soybean (Glycine max), covering ten agronomically important traits, and was compared with a diverse set of classical linear, machine learning, and deep learning models. Across most traits, CLCNet achieved top prediction performance, with statistically significant improvements in Pearson correlation coefficient (PCC), typically ranging from 0.3% to 6.5% over strong baseline models, together with reduced mean squared error (MSE). Notably, the advantages of CLCNet were pronounced for traits in maize, rapeseed, and soybean, while for cotton traits largely governed by additive genetic effects, its performance remained stable and did not show any decline. Overall, these results demonstrated that CLCNet provided a robust and effective framework for improving genomic prediction accuracy and holds substantial potential for accelerating genetic gain in plant breeding.

CLCNet: a contrastive learning and chromosome-aware network for genomic prediction in plants

Matching journals