Back

Toward Automatic Variant Interpretation: Discordant Genetic Interpretation Across Variant Annotations for ClinVar Pathogenic Variants

Chen, A. Y.-A.; Yuan, T.-H.; Huang, J.-H.; Wang, Y.-B.; Hung, T.-M.; Chen, C.-Y.; Hsu, J. S.; Chen, P.-L.

2024-10-15 genomics
10.1101/2024.10.11.617756 bioRxiv
Show abstract

PurposeHigh-throughput sequencing has revolutionized genetic disorder diagnosis, but variant pathogenicity interpretation is still challenging. Even though the Human Genome Variation Society (HGVS) provides recommendations for variant nomenclature, discrepancies in annotation remain a significant hurdle. MethodsThis study evaluated the annotation concordance between three tools-- ANNOVAR, SnpEff, and Variant Effect Predictor (VEP)--using 164,549 two-star variants from ClinVar. The analysis used HGVS nomenclature string-match comparisons to assess annotation consistency from each tool, corresponding coding impacts, and associated ACMG criteria inferred from the annotations. ResultsThe analysis revealed variable concordance rates, with 58.52% agreement for HGVSc, 84.04% for HGVSp, and 85.58% for the coding impact. SnpEff showed the highest match for HGVSc (0.988), while VEP bettered for HGVSp (0.977). The substantial discrepancies were noted in the Loss-of-Function (LoF) category. Incorrect PVS1 interpretations affected the final pathogenicity and downgraded PLP variants (ANNOVAR 55.9%, SnpEff 66.5%, VEP 67.3%), risking false negatives of clinically relevant variants in reports. ConclusionsThese findings highlight the critical challenges in accurately interpreting variant pathogenicity due to discrepancies in annotations. To enhance the reliability of genetic variant interpretation in clinical practice, standardizing transcript sets and systematically cross-validating results across multiple annotation tools is essential. Graphic abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=120 SRC="FIGDIR/small/617756v1_ufig1.gif" ALT="Figure 1"> View larger version (24K): org.highwire.dtl.DTLVardef@9f6378org.highwire.dtl.DTLVardef@3b9a2corg.highwire.dtl.DTLVardef@106fb58org.highwire.dtl.DTLVardef@15f70c8_HPS_FORMAT_FIGEXP M_FIG This study examined the consistency of variant annotations produced by three widely used open-source toolsANNOVAR, SnpEff, and VEPagainst 164,549 ClinVar two starts variants. The investigation covers HGVS-based transcript, protein nomenclature and coding impact annotation. The results showed that none of the tools were fully consistent with ClinVar across all coding impact categories, particularly in the LoF category, which exhibited the poorest consistency. This inconsistency may lead to discrepancies in PVS1 interpretation, affecting the final pathogenicity assessment. PVS1 loss resulted in a significant downgrading of PLP variants, potentially leading to the omission of clinically relevant variants in reports. C_FIG

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Genetics in Medicine
69 papers in training set
Top 0.1%
22.7%
2
Human Mutation
29 papers in training set
Top 0.1%
19.6%
3
Journal of Medical Genetics
28 papers in training set
Top 0.1%
6.4%
4
BMC Medical Genomics
36 papers in training set
Top 0.1%
4.9%
50% of probability mass above
5
European Journal of Human Genetics
49 papers in training set
Top 0.2%
4.9%
6
The Journal of Molecular Diagnostics
36 papers in training set
Top 0.1%
3.6%
7
PLOS ONE
4510 papers in training set
Top 42%
3.1%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
2.8%
9
Human Genomics
21 papers in training set
Top 0.1%
1.9%
10
BMC Bioinformatics
383 papers in training set
Top 4%
1.7%
11
Scientific Reports
3102 papers in training set
Top 57%
1.7%
12
Genome Medicine
154 papers in training set
Top 4%
1.7%
13
BioData Mining
15 papers in training set
Top 0.4%
1.5%
14
npj Genomic Medicine
33 papers in training set
Top 0.4%
1.5%
15
Frontiers in Genetics
197 papers in training set
Top 7%
1.2%
16
Genomics
60 papers in training set
Top 2%
1.1%
17
Clinical Chemistry
22 papers in training set
Top 0.7%
0.9%
18
Human Genetics
25 papers in training set
Top 0.3%
0.9%
19
Database
51 papers in training set
Top 0.8%
0.8%
20
Genetic Epidemiology
46 papers in training set
Top 0.8%
0.8%
21
Bioinformatics
1061 papers in training set
Top 10%
0.7%
22
BMC Genomics
328 papers in training set
Top 6%
0.7%
23
Journal of Personalized Medicine
28 papers in training set
Top 2%
0.6%
24
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.6%
25
PLOS Computational Biology
1633 papers in training set
Top 29%
0.5%
26
European Radiology
14 papers in training set
Top 0.9%
0.5%
27
International Journal of Biological Macromolecules
65 papers in training set
Top 5%
0.5%
28
Genes
126 papers in training set
Top 4%
0.5%