Transforming Semi-structured Variant Assessments into Computable Clinical Assertions: A Pilot Study for AI-Assisted Curation
Cannon, M. J.; Bratulin, A.; Kuzma, K.; Puthawala, D.; Corsmeier, D.; Schieffer, K.; Kelly, B.; Cottrell, C.; Wagner, A. H.
Show abstract
Genomic medicine relies on expert evaluation of genomic variants, but this process is dramatically slowed by a lack of readily-accessible genomic knowledge. Although genomic knowledge resources such as ClinVar and CIViC support structured data sharing and provide interfaces for adding structure, much of the variant interpretation data generated upstream of these resources is not readily interoperable with these resources, limiting the ability of clinical labs to share data and creating knowledge silos. Here we evaluate a strategy for breaking down these knowledge silos in a pilot study to transform semi-structured variant classification knowledge into computable clinical assertions leveraging the Global Alliance for Genomics and Health (GA4GH) Genomic Knowledge Standards specifications. We programmatically mapped previously captured somatic cancer clinical significance classifications from spreadsheets to the GA4GH Variant Annotation specification. For diagnostic classification data, this approach enabled reuse of standards-aware submission tooling to share 1,499 records to ClinVar. We then studied how AI-assisted curation approaches to overcome gaps in unstructured text enabled scalable curation of prior classifications in unstructured text. Using this approach, we were able to accurately classify clinical significance for 71.8% (117/163) of randomly sampled prognostic evidence statements. We conclude with an overview of how this work may be generalized to make computationally inaccessible variant evidence from other clinical laboratories broadly reusable in downstream knowledgebases such as CIViC and ClinVar.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.