Back

Alignment-Free Microhaplotype Genotyping for GT-seq (Genotyping-in-Thousands by Sequencing) Using a Diploid Abundance Model

Campbell, N. R.; Campbell, A. R.; Blair, S. K.; Finger, A. J.

2026-04-03 genetics
10.64898/2026.04.01.715880 bioRxiv
Show abstract

GT-seq (Genotyping-in-Thousands by Sequencing) is widely used for high-throughput amplicon genotyping, but most analytical pipelines focus on single SNPs or rely on alignment-based variant calling. Here we present an alignment-free approach for microhaplotype genotyping that leverages the high read depth and low error rates typical of paired-end Illumina and Element sequencing. The pipeline first identifies primer-bounded reads and resolves paired-end sequences into complete amplicon sequences. Within each sample and locus, unique sequences are ranked by read abundance and the top one or two sequences are retained as candidate diploid alleles. These alleles are aggregated across samples to construct a catalog of unique haplotypes for each locus. In a second pass, reads are assigned to catalog haplotypes by exact sequence matching to produce diploid genotypes. Finally, catalog haplotype sequences are positionally compared to identify phased SNP and collapsed indel variation, generating compact microhaplotype representations suitable for population genetic analysis. This approach enables robust, alignment-free microhaplotype inference directly from high-depth amplicon sequencing data.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 1%
27.8%
2
The American Journal of Human Genetics
206 papers in training set
Top 0.3%
14.4%
3
Genome Biology
555 papers in training set
Top 0.6%
8.3%
50% of probability mass above
4
Nucleic Acids Research
1128 papers in training set
Top 4%
4.9%
5
Genome Medicine
154 papers in training set
Top 1%
4.9%
6
Bioinformatics
1061 papers in training set
Top 5%
4.3%
7
Genome Research
409 papers in training set
Top 0.9%
3.6%
8
PLOS ONE
4510 papers in training set
Top 42%
3.1%
9
Nature Genetics
240 papers in training set
Top 3%
2.6%
10
Genetics
225 papers in training set
Top 2%
1.7%
11
Nature Biotechnology
147 papers in training set
Top 4%
1.7%
12
eLife
5422 papers in training set
Top 43%
1.7%
13
Nature Methods
336 papers in training set
Top 4%
1.7%
14
Cell Genomics
162 papers in training set
Top 4%
1.5%
15
Science
429 papers in training set
Top 16%
1.3%
16
PLOS Genetics
756 papers in training set
Top 12%
1.1%
17
PLOS Computational Biology
1633 papers in training set
Top 22%
0.9%
18
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.9%
19
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
20
Scientific Reports
3102 papers in training set
Top 73%
0.8%
21
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 44%
0.8%
22
Molecular Ecology Resources
161 papers in training set
Top 1%
0.8%
23
Communications Biology
886 papers in training set
Top 26%
0.7%
24
BMC Genomics
328 papers in training set
Top 7%
0.6%
25
Cell
370 papers in training set
Top 19%
0.5%
26
Science Translational Medicine
111 papers in training set
Top 8%
0.5%
27
Science Advances
1098 papers in training set
Top 35%
0.5%