Back

Next-Generation Soybean Haplotype Map as A Genomic Resource for Enhanced Trait Discovery and Functional Analysis

Khan, A. W.; Doddamani, D.; Song, Q.; Vuong, T. D.; Chhapekar, S. S.; Ye, H.; Garg, V.; Varshney, R. K.; Nguyen, H. T.

2026-03-26 genomics
10.64898/2026.03.24.713798 bioRxiv
Show abstract

We present a global soybean haplotype map generated from whole-genome sequencing of 1,278 Glycine max and Glycine soja accessions, comprising 11.37 million SNPs and 2.05 million short insertions and deletions. This map (GmHapMap-II) captures unprecedented worldwide genetic diversity, reflecting the broad extent of the global soybean gene pool. Population structure analyses revealed six geographically distinct subpopulations that affected the linkage and shaped the recombination. The haplotype variation map was used to identify novel genomic regions associated with crude protein content on chromosome 15 that were not detected by a lower SNP density array. LD-based haplotype analysis revealed a superior haplotype for crude protein content. The constructed haplotype map enabled detailed characterization of haplotype diversity and copy number polymorphism at the SCN-associated rhg-1 and Rhg-4 loci, revealing both novel haplotype structures and germplasm lines with elevated CNV relative to previously characterized genotypes. We employed the HapMap matrix for a multi-class variations ML-based genomic prediction approach to predict phenotypes for SCN and catalogued the gene-centric haplotypes in a user-friendly database. The analysis revealed the extent of deleterious alleles present in the soybean germplasm and how breeders have deployed beneficial alleles and purged deleterious ones. The haplotype map will serve as a major genomic resource for trait-based mapping, enhancing efforts in the genomics-enabled development of improved cultivars.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Horticulture Research
43 papers in training set
Top 0.1%
21.8%
2
Nature Communications
4913 papers in training set
Top 7%
18.0%
3
Plant Biotechnology Journal
56 papers in training set
Top 0.1%
6.9%
4
The Plant Journal
197 papers in training set
Top 0.9%
6.2%
50% of probability mass above
5
Scientific Reports
3102 papers in training set
Top 39%
3.5%
6
Frontiers in Genetics
197 papers in training set
Top 2%
3.5%
7
Plant Communications
35 papers in training set
Top 0.4%
3.5%
8
Frontiers in Plant Science
240 papers in training set
Top 3%
3.0%
9
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
2.6%
10
Communications Biology
886 papers in training set
Top 4%
2.5%
11
Cell Genomics
162 papers in training set
Top 3%
2.0%
12
The Plant Genome
53 papers in training set
Top 0.3%
1.8%
13
Genome Biology
555 papers in training set
Top 4%
1.8%
14
PLOS ONE
4510 papers in training set
Top 51%
1.8%
15
Molecular Plant
36 papers in training set
Top 0.8%
1.6%
16
Cell Reports
1338 papers in training set
Top 29%
1.2%
17
Genome Medicine
154 papers in training set
Top 6%
1.1%
18
New Phytologist
309 papers in training set
Top 4%
0.9%
19
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.9%
20
BMC Genomics
328 papers in training set
Top 4%
0.9%
21
International Journal of Biological Macromolecules
65 papers in training set
Top 3%
0.9%
22
Nucleic Acids Research
1128 papers in training set
Top 18%
0.7%
23
Nature Genetics
240 papers in training set
Top 8%
0.7%
24
GigaScience
172 papers in training set
Top 3%
0.7%
25
Scientific Data
174 papers in training set
Top 3%
0.6%
26
Genes
126 papers in training set
Top 4%
0.6%
27
Science Advances
1098 papers in training set
Top 34%
0.6%
28
Journal of Genetics and Genomics
36 papers in training set
Top 3%
0.6%