Back

HiFi sequencing accurately identifies clinically relevant variants in paralogous genes

van der Sanden, B.; Betz, C.; Herzog, K.; Schamschula, E.; Wimmer, K.; Vater, I.; Balachandran, S.; Chen, X.; Corominas Galbany, J.; Timmermans, R.; Derks, R.; HiFi Solves EMEA Consortium, ; Spielmann, M.; Eberle, M. A.; Gilissen, C.; Vissers, L. E. L. M.; Zschocke, J.; Bolz, H. J.; Hoischen, A.

2025-10-31 genetic and genomic medicine
10.1101/2025.10.29.25339045 medRxiv
Show abstract

Short-read sequencing (SRS) methods have improved the detection of small genetic variants but remain limited in highly homologous genomic regions, such as segmental duplications with gene-pseudogene pairs. These paralogous regions often require complex, locus-specific assays for accurate analysis. Long-read genome sequencing (lrGS) technologies, such as PacBio HiFi sequencing, can span these regions but still face challenges in variant calling due to alignment ambiguities. Here, we evaluated PacBio HiFi lrGS combined with Paraphase, a dedicated haplotype-based variant caller, in 86 individuals with 125 known clinically relevant variants across 11 paralogous loci. Standard HiFi variant callers detected 95/125 variants, while the remaining 30 variants were only identified by Paraphase. Together, the standard variant callers and Paraphase detected all known variants, including SNVs, InDels, CNVs, SVs, and gene conversions. In addition, lrGS allowed accurate phasing and gene-pseudogene copy number detection. We demonstrate that PacBio HiFi lrGS, particularly when integrated with Paraphase, enables comprehensive variant detection in previously difficult-to-assess genomic regions. These results also suggest that lrGS is ready for a wider implementation, possibly as a first-tier diagnostic approach for individuals with suspected variants in these paralogous regions.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.1%
39.8%
2
The American Journal of Human Genetics
206 papers in training set
Top 0.7%
6.9%
3
Genetics in Medicine
69 papers in training set
Top 0.3%
6.4%
50% of probability mass above
4
Scientific Reports
3102 papers in training set
Top 32%
3.9%
5
Med
38 papers in training set
Top 0.1%
3.6%
6
The Journal of Molecular Diagnostics
36 papers in training set
Top 0.1%
3.3%
7
Nature Communications
4913 papers in training set
Top 42%
3.1%
8
Nucleic Acids Research
1128 papers in training set
Top 8%
2.5%
9
Genome Biology
555 papers in training set
Top 3%
2.1%
10
BMC Genomics
328 papers in training set
Top 2%
2.1%
11
npj Genomic Medicine
33 papers in training set
Top 0.3%
2.1%
12
Human Mutation
29 papers in training set
Top 0.3%
2.1%
13
PLOS ONE
4510 papers in training set
Top 54%
1.7%
14
Clinical Chemistry
22 papers in training set
Top 0.4%
1.5%
15
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
16
Genetics in Medicine Open
10 papers in training set
Top 0.1%
1.3%
17
Cell Genomics
162 papers in training set
Top 4%
1.3%
18
European Journal of Human Genetics
49 papers in training set
Top 1%
0.9%
19
Bioinformatics Advances
184 papers in training set
Top 4%
0.8%
20
Human Genetics
25 papers in training set
Top 0.3%
0.8%
21
Communications Biology
886 papers in training set
Top 23%
0.8%
22
Bioinformatics
1061 papers in training set
Top 9%
0.8%
23
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 7%
0.7%
24
PLOS Computational Biology
1633 papers in training set
Top 27%
0.7%
25
Frontiers in Genetics
197 papers in training set
Top 12%
0.5%
26
Frontiers in Bioinformatics
45 papers in training set
Top 1%
0.5%