Back

CohortSymmetry: An R package to perform sequence symmetry analysis using the OMOP common data model

Chen, X.; Stanford, T.; Guo, Y.; Raventos, B.; Du, M.; Li, X.; Lam, A.; Corby, G.; Mercade-Besora, N.; Alcalde Herraiz, M.; Lopez-Guell, K.; Delmestri, A.; Man, W. Y.; PRIETO-ALHAMBRA, D.; Burn, E.; Catala, M.; Pratt, N.; Jodicke, A.; Newby, D.

2025-11-17 pharmacology and therapeutics
10.1101/2025.11.14.25340229 medRxiv
Show abstract

BackgroundReal-world data are valuable for detecting adverse drug events, and Sequence Symmetry Analysis (SSA) is a simple yet effective method frequently used for this purpose. However, heterogeneous implementations across studies limit reproducibility and scalability. To address this, we developed an open-source R package that standardises SSA analytics using data mapped to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). MethodsWe developed CohortSymmetry, an R package that implements SSA for OMOP CDM data. The package was validated through unit testing and evaluated empirically by estimating adjusted sequence ratios (ASRs) with 95% confidence intervals (CIs) for 23 positive and 10 negative controls across six European databases, including CPRD GOLD (UK) and THIN(R) (Belgium, Italy, Romania, Spain, UK). Sensitivity and specificity were defined as the proportions of positive and negative controls correctly identified by SSA. Sensitivity analyses varied key parameters, including the washout period. ResultsCohortSymmetry passed high-coverage unit tests. Of 33 eligible controls, four showed results consistent with expectations across all databases; for example, the amiodarone-levothyroxine pair had a lower 95% CI bound >1 in each. Sensitivity was moderate, whereas specificity was high in the primary analyses. Parameter variation influenced outcomes; a 365-day prior observation requirement reduced specificity in CPRD GOLD from 75% to 38%. ConclusionsCohortSymmetry enables reproducible SSA using OMOP CDM data. Differences across databases likely reflect heterogeneity in data capture and prescribing patterns. Limitations include residual data variability and SSAs susceptibility to time-varying confounding, underscoring the need for tailored analytic design in pharmacovigilance studies. Key MessagesO_LIWe developed CohortSymmetry, an open-source R package that standardises SSA analytics using OMOP CDM-mapped data and verified the correctness of functions via unit testing and application to real-world datasets. C_LIO_LICohortSymmetry passed high-coverage tests, and among 33 selected controls, four showed results consistent with expectations across all databases; varying analytical parameters affected results. C_LIO_LIThe package provides a reproducible and scalable framework for multi-database SSA studies, supporting robust pharmacovigilance, but careful specification of parameters is required to account for the characteristics of the medical domain under investigation. C_LI

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.1%
32.3%
2
Clinical Pharmacology & Therapeutics
25 papers in training set
Top 0.1%
14.1%
3
British Journal of Clinical Pharmacology
21 papers in training set
Top 0.1%
9.0%
50% of probability mass above
4
Clinical and Translational Science
21 papers in training set
Top 0.1%
8.2%
5
PLOS ONE
4510 papers in training set
Top 41%
3.5%
6
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.8%
3.2%
7
Journal of Biomedical Informatics
45 papers in training set
Top 0.6%
2.7%
8
BioData Mining
15 papers in training set
Top 0.1%
2.7%
9
Frontiers in Pharmacology
100 papers in training set
Top 1%
2.7%
10
Trials
25 papers in training set
Top 0.5%
2.5%
11
BMJ Open
554 papers in training set
Top 7%
2.5%
12
BMC Medical Research Methodology
43 papers in training set
Top 1%
0.9%
13
British Journal of General Practice
22 papers in training set
Top 0.5%
0.8%
14
JAMIA Open
37 papers in training set
Top 1%
0.8%
15
Journal of Medical Internet Research
85 papers in training set
Top 4%
0.8%
16
American Journal of Gastroenterology
15 papers in training set
Top 0.3%
0.7%
17
Nature Communications
4913 papers in training set
Top 64%
0.7%
18
Journal of Personalized Medicine
28 papers in training set
Top 1%
0.7%
19
Journal of Translational Medicine
46 papers in training set
Top 3%
0.7%
20
Pilot and Feasibility Studies
12 papers in training set
Top 0.7%
0.7%
21
Bioinformatics
1061 papers in training set
Top 10%
0.6%
22
npj Digital Medicine
97 papers in training set
Top 4%
0.6%