Back

Harmonising UK primary care prescription records for research: A case study in the UK Biobank

Ytsma, C. R.; Torralbo, A.; Fitzpatrick, N. K.; Pietzner, M.; Louloudis, I.; Nguyen, D.; Ansarey, S.; Denaxas, S.

2026-04-22 health informatics
10.64898/2026.04.21.26351274 medRxiv
Show abstract

Objective The aim of this study was to develop and validate an automated, scalable framework to harmonise fragmented UK primary care prescription records into a research-ready dataset by mapping four diverse medical ontologies to a unified, historically comprehensive reference standard. Materials and Methods We used raw prescription records for consented participants in the UK Biobank, in which participants are uniquely characterized by multiple data modalities. Primary care data were preprocessed by selecting one drug code if multiple were recorded, cleaning codes to match reference presentations, expanding code granularity based on drug descriptions, and updating outdated codes to a single reference version. Harmonisation entailed mapping British National Formulary (BNF) and Read2 codes to dm+d, the universal NHS standard vocabulary for uniquely identifying and prescribing medicines. Harmonised dm+d records were then homogenised to a single concept granularity, the Virtual Medicinal Product (VMP). We validated our methods by creating medication profiles mapping contemporary drug prescribing patterns in 312 physical and mental health conditions. Results We preprocessed 57,659,844 records (100%) from 221,868 participants (100%). Of those, 48,950 records were dropped due to lack of drug code. 7,357,572 records (13%) used multiple ontologies. Most (76%) records were encoded in BNF and most had the code granularity expanded via the drug description (N=28,034,282; 49%). 41,244,315 records (72%) were harmonised to dm+d and 99.98% of these were converted to VMP as a homogeneous dataset. Across 312 diseases, we identified 23,352 disease-drug associations with 237 medications (represented as BNF subparagraphs) that survived statistical correction of which most resembled drug - indication pairs. Conclusion Our methodology converts highly fragmented and raw prescription records with inconsistent data quality into a streamlined, enriched dataset at a single reference, version, and granularity of information. Harmonised prescription records can be easily utilised by researchers to perform large-scale analyses in research.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
British Journal of General Practice
22 papers in training set
Top 0.1%
14.2%
2
npj Digital Medicine
97 papers in training set
Top 0.4%
12.6%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.3%
10.3%
4
International Journal of Medical Informatics
25 papers in training set
Top 0.2%
6.3%
5
The Lancet Digital Health
25 papers in training set
Top 0.1%
6.3%
6
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.5%
6.3%
50% of probability mass above
7
JAMIA Open
37 papers in training set
Top 0.3%
4.8%
8
Wellcome Open Research
57 papers in training set
Top 0.2%
4.8%
9
BMJ Open
554 papers in training set
Top 7%
2.7%
10
JMIR Medical Informatics
17 papers in training set
Top 0.5%
2.3%
11
Journal of Biomedical Informatics
45 papers in training set
Top 0.7%
2.1%
12
PLOS ONE
4510 papers in training set
Top 51%
1.9%
13
Nature Communications
4913 papers in training set
Top 50%
1.8%
14
PLOS Digital Health
91 papers in training set
Top 2%
1.5%
15
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.5%
16
BMC Medicine
163 papers in training set
Top 5%
1.2%
17
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.3%
1.2%
18
Scientific Data
174 papers in training set
Top 2%
1.2%
19
European Respiratory Journal
54 papers in training set
Top 1%
1.1%
20
Scientific Reports
3102 papers in training set
Top 69%
0.9%
21
Frontiers in Digital Health
20 papers in training set
Top 1%
0.9%
22
Clinical and Translational Science
21 papers in training set
Top 0.8%
0.9%
23
BMJ Health & Care Informatics
13 papers in training set
Top 0.8%
0.8%
24
JMIR Public Health and Surveillance
45 papers in training set
Top 4%
0.7%