Back

Minimizing Reference Bias with an Impute-First Approach

Vaddadi, N. S. K.; Mun, T.; Langmead, B.

2023-12-02 bioinformatics
10.1101/2023.11.30.568362 bioRxiv
Show abstract

Pangenome indexes reduce reference bias in sequencing data analysis. However, bias can be reduced further by using a personalized reference, e.g. a diploid human reference constructed to match a donor individuals alleles. We present a novel impute-first alignment framework that combines elements of genotype imputation and pangenome alignment. It begins by genotyping the individual using only a subsample of the input reads. It next uses a reference panel and efficient imputation algorithm to impute a personalized diploid reference. Finally, it indexes the personalized reference and applies a read aligner, which could be a linear or graph aligner, to align the full read set to the personalized reference. This framework achieves higher variant-calling recall (99.54% vs. 99.37%), precision (99.36% vs. 99.18%), and F1 (99.45% vs. 99.28%) compared to a graph pangenome aligner. The personalized reference is also smaller and faster to query compared to a pangenome index, making it an overall advantageous choice for whole-genome DNA sequencing experiments.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Genome Research
409 papers in training set
Top 0.1%
22.6%
2
Bioinformatics
1061 papers in training set
Top 1%
18.6%
3
Nature Communications
4913 papers in training set
Top 18%
10.1%
50% of probability mass above
4
Genome Biology
555 papers in training set
Top 1%
6.3%
5
BMC Bioinformatics
383 papers in training set
Top 2%
4.9%
6
Genome Medicine
154 papers in training set
Top 2%
3.6%
7
Bioinformatics Advances
184 papers in training set
Top 1%
3.6%
8
Nucleic Acids Research
1128 papers in training set
Top 7%
2.7%
9
PLOS Computational Biology
1633 papers in training set
Top 14%
2.1%
10
PLOS ONE
4510 papers in training set
Top 48%
2.1%
11
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.9%
12
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
13
GigaScience
172 papers in training set
Top 1%
1.7%
14
Scientific Reports
3102 papers in training set
Top 60%
1.7%
15
Nature Methods
336 papers in training set
Top 5%
1.5%
16
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.3%
1.3%
17
Cell Systems
167 papers in training set
Top 10%
1.0%
18
Nature Biotechnology
147 papers in training set
Top 6%
0.9%
19
Communications Biology
886 papers in training set
Top 19%
0.9%
20
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
0.9%
21
iScience
1063 papers in training set
Top 32%
0.7%
22
European Journal of Human Genetics
49 papers in training set
Top 1%
0.7%
23
BMC Genomics
328 papers in training set
Top 7%
0.6%
24
Nature Machine Intelligence
61 papers in training set
Top 4%
0.6%