Back

preSCRIPT: Large-scale prescription search and annotation engine for pharmacogenomic studies

Pieczarka, M.; Pienkowski, P.; Konowalska, P.; Grubarek, S.; Hajto, J.; Hoinkis, D.; Piechota, M.; Borczyk, M.; Korostynski, M.

2026-04-29 genetic and genomic medicine
10.64898/2026.04.28.26351989 medRxiv
Show abstract

Pharmacogenetics (PGx) has traditionally focused on a small number of high-impact variants affecting drug response due to the fact that PGx studies are labor-intensive and therefore low-throughput. Population biobanks linked to electronic health records (EHRs), including the UK Biobank (UKB) with prescription data for [~]230,000 individuals offer opportunities to scale PGx research. This, however, comes with a challenge as EHRs do not provide direct treatment response outcomes. One way to overcome this is to draw indirect drug response phenotypes from prescription records. Here, we propose preSCRIPT, a framework to filter and annotate raw prescriptions from the UKB to derive phenotypes for analyses which includes an algorithm to distinguish short prescription gaps from true dose changes. As a proof of concept, we applied preSCRIPT to warfarin, paracetamol, codeine, amitriptyline, simvastatin, aspirin, and amlodipine and derived therapy length and median daily doses. We tested associations for those seven drugs and two phenotypes across SNPs, cytochrome P450 (CYP) genes, and HLA alleles. We replicated known associations such as CYP2D6 variants with amitriptyline therapy length and dose, CYP2C9/CYP4F2/CYP2C19 with warfarin dose, and CYP2D6 with codeine dose. For drugs without formal PGx guidelines, we identified an association between CYP2D6 activity and aspirin therapy length and several SNPs, including rs62471929 (CYP3A5), a variant for amlodipine dose, replicated in an independent hold-out set. Overall, our study shows that preSCRIPT can recover established PGx associations, prioritize exploratory novel candidate loci, and may serve as a tool for large-scale pharmacogenomics.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.1%
26.9%
2
Bioinformatics
1061 papers in training set
Top 2%
15.3%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.3%
8.7%
50% of probability mass above
4
Nature Genetics
240 papers in training set
Top 2%
4.5%
5
Nature Communications
4913 papers in training set
Top 44%
2.8%
6
Scientific Reports
3102 papers in training set
Top 44%
2.7%
7
Bioinformatics Advances
184 papers in training set
Top 2%
2.5%
8
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.2%
9
The American Journal of Human Genetics
206 papers in training set
Top 2%
2.2%
10
PLOS ONE
4510 papers in training set
Top 50%
1.9%
11
BMC Medical Genomics
36 papers in training set
Top 0.4%
1.8%
12
npj Digital Medicine
97 papers in training set
Top 2%
1.8%
13
Communications Medicine
85 papers in training set
Top 0.2%
1.8%
14
BMC Genomics
328 papers in training set
Top 3%
1.4%
15
PLOS Computational Biology
1633 papers in training set
Top 21%
1.0%
16
PLOS Genetics
756 papers in training set
Top 12%
1.0%
17
eLife
5422 papers in training set
Top 51%
1.0%
18
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.4%
0.9%
19
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.9%
20
Frontiers in Genetics
197 papers in training set
Top 8%
0.9%
21
iScience
1063 papers in training set
Top 25%
0.9%
22
Nature Human Behaviour
85 papers in training set
Top 4%
0.8%
23
Genetic Epidemiology
46 papers in training set
Top 0.7%
0.8%
24
Cell Genomics
162 papers in training set
Top 6%
0.8%
25
European Journal of Human Genetics
49 papers in training set
Top 1%
0.7%
26
BioData Mining
15 papers in training set
Top 0.9%
0.7%
27
Nucleic Acids Research
1128 papers in training set
Top 18%
0.7%
28
Science Translational Medicine
111 papers in training set
Top 7%
0.7%
29
Frontiers in Molecular Biosciences
100 papers in training set
Top 6%
0.7%
30
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 7%
0.5%