Back

deCYPher: Star Allele-Resolution Computational Framework of Pharmacogenes for Haplotype-Resolved Long-Read Assemblies

Chang, T.-Y.; Liu, Y.-S.; Lai, H.-S.; Hung, T.-K.; Lin, H.-F.; Lin, Y.-H.; Hsu, C.-L.; Yang, Y.-C.; Chen, C.-Y.; Chen, P.-L.; Hsu, J. S.

2025-11-03 bioinformatics
10.1101/2025.10.13.681303 bioRxiv
Show abstract

Although existing next-generation sequencing (NGS) tools, such as Aldy and Cyrius, have been applied for allele typing, they cannot achieve complete accuracy due to various genomic challenges including pseudogenes, structural variations, hybrid genes, copy number variations, and gene deletions. These complexities make accurate pharmacogene interpretation more challenging, despite the crucial role pharmacogenomics plays in precision medicine. We developed deCYPher, a tool that generates personalized pharmacogenomic reports from haplotype-resolved assemblies. The tool enables analysis of all PharmVar 1A level genes, such as CYP2B6, CYP2C9, CYP2C19, CYP2D6, CYP3A5, CYP4F2, DPYD, NUDT15, and SLCO1B1. Applied to all HPRC haplotypes (including both release 1 and release 2 data), deCYPher demonstrated high accuracy in resolving complex gene structures. In the case of CYP2D6, release 1 identified 6% gene multiplications, 6% full gene deletions, and 4% CYP2D6/CYP2D7 hybrids. By contrast, release 2 demonstrated an increased prevalence of multiplications (14%) and hybrids (11%), while the frequency of full gene deletions remained comparable at 5%. Comparison with pb-StarPhase revealed discrepancies in 12 of 94 assemblies in the release 1 dataset. For instance, in sample HG02257, Aldy, Cyrius, and deCYPher consistently identified the genotype as *2/*35, whereas pb-StarPhase reported *2/*2. Notably, the *35-defining variants were present in the BAM and VCF files in the pb-StarPhase pipeline, but the local read depth over the *35-specific region was only 5x in HG02257-p, suggesting that the misclassification likely resulted from insufficient coverage - a known limitation of pb-StarPhase under low-depth conditions.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.1%
40.9%
2
The American Journal of Human Genetics
206 papers in training set
Top 0.5%
8.7%
3
Nature Communications
4913 papers in training set
Top 32%
5.0%
50% of probability mass above
4
Clinical Pharmacology & Therapeutics
25 papers in training set
Top 0.1%
3.7%
5
Bioinformatics
1061 papers in training set
Top 5%
3.7%
6
BMC Bioinformatics
383 papers in training set
Top 3%
3.2%
7
Bioinformatics Advances
184 papers in training set
Top 2%
2.7%
8
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
2.0%
9
Nucleic Acids Research
1128 papers in training set
Top 9%
2.0%
10
PLOS ONE
4510 papers in training set
Top 52%
1.8%
11
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.8%
12
BMC Genomics
328 papers in training set
Top 2%
1.8%
13
BioData Mining
15 papers in training set
Top 0.3%
1.7%
14
Genome Research
409 papers in training set
Top 3%
1.5%
15
Clinical and Translational Science
21 papers in training set
Top 0.5%
1.4%
16
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.4%
17
Genome Biology
555 papers in training set
Top 5%
1.4%
18
Scientific Reports
3102 papers in training set
Top 65%
1.3%
19
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
1.2%
20
Cell Genomics
162 papers in training set
Top 6%
0.8%
21
PLOS Computational Biology
1633 papers in training set
Top 25%
0.7%
22
Computational and Structural Biotechnology Journal
216 papers in training set
Top 11%
0.5%