Back

Mapping and Harmonization of CVX vaccine terms tothe Vaccine Ontology

Pan, Y.; Manuel, W.; Abeysinghe, R.; Zheng, J.; Davydov, A.; Yang, Q.; Lin, A. Y.; Cui, L.; He, Y.

2025-07-18 bioinformatics
10.1101/2025.07.15.664501 bioRxiv
Show abstract

BackgroundWith many vaccines developed and used, it is critical to standardize vaccine information. The OHDSI OMOP Common Data Model (CDM), widely used to support EHR data integration and analysis, leverages CVX, RxNorm, and RxNorm Extension codes to standardize vaccine-related records. However, these terminologies lack robust semantic relations, making the vaccine classification ineffective in OMOP CDM. To address this issue, our OHDSI Vaccine Vocabulary Working Group proposes to use the Vaccine Ontology (VO) to map these standards and build up its own semantic relations. As a first study of the work, we performed the mapping and alignment of the Vaccine Administered (CVX) codes with the VO using a combination of semi-automatic and manual mapping methods. ResultsA total of 273 CVX terms were first collected and classified. A high-level VO design pattern and an exact one-to-one mapping strategy were developed to guide the CVX-to-VO term mapping. To facilitate the manual mapping and harmonization process, we also developed and evaluated three semi-automated mapping approaches utilizing lexical and semantic information of vaccine concepts to map CVX to VO. These approaches suggested candidate VO mappings for CVX terms and also indicated CVX terms that were unmappable to VO and required new term additions to VO. The application of the best approach to the 2022-10-05 release of VO achieved an accuracy of 85.55% for its suggestions. The suggestions made by the semi-automated approaches were taken into account to further enhance the mappings, which led to our eventual mapping of all CVX terms to the latest version of VO. We innovatively proposed the inclusion of the passive vaccine branch in VO, which includes 24 immunoglobulins and antitoxins from CVX as passive vaccines. A specific CVX-VO OWL file was developed and added to the VO GitHub. Use case queries were developed to demonstrate its support for computer-assisted queries of vaccine groups based on CVX-VO hierarchies. ConclusionAll CVX terms were mapped to the VO using our combined semi-automatic and manual mapping methods. The mapped results enhanced semantic vaccine classification, providing a basis for further OMOP vaccine classification and EHR data analysis.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Database
51 papers in training set
Top 0.1%
23.1%
2
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.3%
8.6%
3
PLOS ONE
4510 papers in training set
Top 21%
8.6%
4
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.4%
7.0%
5
JAMIA Open
37 papers in training set
Top 0.3%
4.1%
50% of probability mass above
6
Journal of Biomedical Informatics
45 papers in training set
Top 0.4%
3.7%
7
JMIR Medical Informatics
17 papers in training set
Top 0.4%
3.0%
8
GigaScience
172 papers in training set
Top 0.7%
2.8%
9
Vaccines
196 papers in training set
Top 0.8%
2.7%
10
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.1%
11
Scientific Data
174 papers in training set
Top 0.9%
1.9%
12
Journal of Medical Internet Research
85 papers in training set
Top 2%
1.7%
13
Nucleic Acids Research
1128 papers in training set
Top 10%
1.7%
14
BMC Bioinformatics
383 papers in training set
Top 5%
1.5%
15
Bioinformatics
1061 papers in training set
Top 7%
1.5%
16
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.4%
17
Informatics in Medicine Unlocked
21 papers in training set
Top 0.6%
1.3%
18
BioData Mining
15 papers in training set
Top 0.6%
1.0%
19
Frontiers in Pharmacology
100 papers in training set
Top 4%
0.9%
20
PLOS Computational Biology
1633 papers in training set
Top 22%
0.9%
21
Scientific Reports
3102 papers in training set
Top 70%
0.9%
22
npj Digital Medicine
97 papers in training set
Top 3%
0.8%
23
Frontiers in Medicine
113 papers in training set
Top 7%
0.7%
24
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.5%
0.7%
25
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 6%
0.7%
26
Open Forum Infectious Diseases
134 papers in training set
Top 3%
0.7%
27
Nature Communications
4913 papers in training set
Top 67%
0.5%
28
Briefings in Bioinformatics
326 papers in training set
Top 8%
0.5%
29
Bioinformatics Advances
184 papers in training set
Top 5%
0.5%
30
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 3%
0.5%