Back

Benchmarking SNP-Calling Accuracy Against Known Citrus Pedigrees Reveals Pangenome Advantages Over Linear References

Kuster, R. D.; Sisler, P.; Sandhu, K.; Yin, L.; Niece, S.; Krueger, R.; Dardick, C.; Keremane, M.; Ramadugu, C.; Staton, M. E.

2026-04-09 genomics
10.64898/2026.04.07.716967 bioRxiv
Show abstract

BackgroundPangenomes are a promising new approach to genomics that can reduce reference bias in genotyping, but the reliability of such a data model remains unclear in tracking variation across species. To test the utility of graph-based pangenomes for interspecific breeding, we developed a Minigraph-Cactus super-pangenome representing four Citrus species derived from the founder lines of a citrus breeding program. To benchmark SNP calling accuracy using graph and linear-based approaches, we performed whole genome short read sequencing for two sets of pedigreed progeny: 30 F1 hybrids and 244 advanced hybrids from an F1 crossed with a parent not included in the pangenome. ResultsThe linear approach yielded more SNP calls than the graph-based approach, however, both methods exhibited similar Mendelian Inheritance Error Rates (MIER) in a tool-dependent manner. Reconstruction of parental haplotype blocks in the advanced hybrids revealed a striking improvement in performance in the pangenome graph-based calls, suggesting MIER is vulnerable to error when reference bias influences both parental and progeny genotype calls. Masking of regions diverged from the reference path improved MIER accuracy metrics and haplotype block reconstruction in both the linear and graph-based SNP calls. ConclusionsIn non-model systems, inheritance patterns observed from pedigreed hybrids provide a framework for benchmarking variant-calling accuracy using pangenomes. SNP miscalls originating from diverged regions can falsely satisfy MIER filters, thus we recommend haplotype blocks. The inherent structure of the pangenome graph has promising applications for removing regions of unreliable mapping quality, which cannot otherwise be reliably removed using traditional filtering metrics.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
The Plant Genome
53 papers in training set
Top 0.1%
27.6%
2
Frontiers in Plant Science
240 papers in training set
Top 0.6%
12.7%
3
BMC Genomics
328 papers in training set
Top 0.2%
8.2%
4
Horticulture Research
43 papers in training set
Top 0.5%
4.2%
50% of probability mass above
5
PLOS ONE
4510 papers in training set
Top 36%
4.0%
6
Methods in Ecology and Evolution
160 papers in training set
Top 0.8%
4.0%
7
G3
33 papers in training set
Top 0.1%
2.7%
8
GigaScience
172 papers in training set
Top 0.8%
2.6%
9
Scientific Reports
3102 papers in training set
Top 46%
2.6%
10
The Plant Journal
197 papers in training set
Top 2%
2.6%
11
Molecular Ecology Resources
161 papers in training set
Top 0.4%
2.4%
12
Plant Biotechnology Journal
56 papers in training set
Top 0.5%
2.4%
13
BMC Bioinformatics
383 papers in training set
Top 4%
1.8%
14
Applications in Plant Sciences
21 papers in training set
Top 0.2%
1.7%
15
Bioinformatics
1061 papers in training set
Top 7%
1.7%
16
G3: Genes, Genomes, Genetics
222 papers in training set
Top 0.5%
1.5%
17
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.3%
18
Bioinformatics Advances
184 papers in training set
Top 4%
1.2%
19
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
0.9%
20
Frontiers in Genetics
197 papers in training set
Top 8%
0.9%
21
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%
22
BMC Plant Biology
47 papers in training set
Top 1.0%
0.7%
23
Genome Biology
555 papers in training set
Top 7%
0.7%
24
Gigabyte
60 papers in training set
Top 2%
0.6%
25
PLANTS, PEOPLE, PLANET
21 papers in training set
Top 0.9%
0.6%