Language models reveal evidence gaps in variants of uncertain significance

Li, W.; Bhat, V.; Yu, T.; Lebo, M.; Zitnik, M.; Cassa, C. A.

2026-03-02 genetic and genomic medicine

10.64898/2026.02.28.26347206 medRxiv

Show abstract

BackgroundMost rare coding variants in monogenic disease genes remain classified as Variants of Uncertain Significance (VUS), limiting their use in clinical care. Many variant classifications have been submitted to ClinVar, often with rich free-text summaries of the evidence underlying each classification. These narratives are not standardized and are difficult to mine systematically, making it challenging to identify variants that might be reclassified as new evidence becomes available. MethodsWe developed a two-stage language-model pipeline that (i) detects whether functional, population, or computational evidence is described in ClinVar and ClinGen variant summaries, and (ii) classifies whether it is evidence of pathogenicity or benignity. We first constructed Variant Evidence Text Annotations (VETA), a dataset of 44,522 ACMG/AMP keyword-description pairs derived from 18,678 ClinVar and ClinGen variant summaries using an LLM-based consensus annotation procedure. We then fine-tuned BioBERT-large models for each evidence type and stage, and validated performance using independent ClinGen expert-curated summaries as well as orthogonal variant-level evidence, including functional screening, computational scores, and population estimates of disease impact. ResultsAcross evidence types, our models accurately identify whether functional, population, and computational evidence is present and whether it leans toward a pathogenic or benign impact. We find high agreement with ClinGen expert annotations and highly significant separation of validation scores between model-predicted benign and pathogenic groups (functional assays p = 8.13 x 10-30, variant allele frequencies p = 4.11 x 10-22, computational predictions p < 8.88 x 10-16). We applied the full workflow to approximately 6,000 ClinVar VUS variants whose submission summaries lacked explicit functional or population evidence. By aggregating external functional, population, computational, and diagnostic evidence using the ACMG/AMP SVI point-based framework, we found that about 17% of these VUS meet quantitative thresholds for a likely benign or likely pathogenic classification, including 492 VUS in genes reviewed by ClinGen Variant Curation Expert Panels. ConclusionsTransforming unstructured variant summaries into a structured, evidence-type matrix enables scalable detection of evidence gaps, allowing for the systematic integration of new data sources, and prioritization of VUS that are most likely to be reclassified. This language model-enabled pipeline provides a generalizable digital approach to identify clinical evidence gaps as functional screens, biobank resources, and computational predictors continue to evolve.

Language models reveal evidence gaps in variants of uncertain significance

Matching journals