Back

From claims to care: Machine learning algorithm to classify urinary tract infection cases using Swiss health insurance data

Aghlmandi, S.; Shafiezadeh, S.; Huber, C.; Godet, P.; Bucher, H. C.; Bielicki, J. A.

2025-10-01 primary care research
10.1101/2025.09.29.25336862 medRxiv
Show abstract

ObjectivesTo evaluate whether machine learning (ML) applied to comprehensive claims data without diagnostic codes can distinguish a high proportion of antibiotic treatment episodes as urinary tract infection (UTI) or non-UTI cases. Such approaches may be valuable for antimicrobial stewardship when diagnosis-linked datasets are unavailable. MethodsOutpatient antibiotic prescription claims from three major Swiss insurers (2017-2020; [~]40% of the Swiss population) were analyzed. Based on clinical input, specific constellations of claims codes (e.g. positive urine culture plus typical antibiotic) were a priori assigned as indicating UTI episodes, providing the reference classification. Predictors included sex, age group, comorbidity, and diagnostic tests ordered during the episode. Four ML classifiers were tested; performance and interpretability were evaluated, with XGBoost prioritized. ResultsAfter cleaning and balancing, 38,982 records (19,491 UTI; 19,491 non-UTI) were included. XGBoost achieved an AUC of 0.94, accuracy of 87.6%, sensitivity of 79.2%, and specificity of 96.1%. Misclassification was asymmetric: 11% of non-UTI cases were labeled UTI, while 2% of UTI cases were misclassified as non-UTI. Diagnostics ordered were the strongest predictors, followed by female sex and older age. ConclusionsEven in the absence of diagnosis codes, ML applied to claims data can reliably identify UTI-related prescriptions. This supports the feasibility of claims-based surveillance tools for stewardship, while in parallel highlighting the need for scalable, low-burden approaches to improve direct diagnostic coding in routine data.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
The Lancet Digital Health
25 papers in training set
Top 0.1%
19.9%
2
Journal of Infection
71 papers in training set
Top 0.1%
6.5%
3
BMC Medicine
163 papers in training set
Top 0.5%
6.5%
4
Scientific Reports
3102 papers in training set
Top 22%
5.0%
5
PLOS ONE
4510 papers in training set
Top 30%
5.0%
6
JMIR Public Health and Surveillance
45 papers in training set
Top 0.3%
5.0%
7
Communications Medicine
85 papers in training set
Top 0.1%
4.4%
50% of probability mass above
8
Antibiotics
32 papers in training set
Top 0.3%
4.1%
9
Frontiers in Medicine
113 papers in training set
Top 2%
2.1%
10
Wellcome Open Research
57 papers in training set
Top 0.6%
2.1%
11
British Journal of General Practice
22 papers in training set
Top 0.2%
1.9%
12
Journal of Medical Internet Research
85 papers in training set
Top 2%
1.9%
13
Journal of Clinical Medicine
91 papers in training set
Top 3%
1.8%
14
International Journal of Medical Informatics
25 papers in training set
Top 0.8%
1.7%
15
BMC Infectious Diseases
118 papers in training set
Top 3%
1.7%
16
PLOS Digital Health
91 papers in training set
Top 1%
1.7%
17
BMC Medical Research Methodology
43 papers in training set
Top 0.7%
1.4%
18
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.4%
19
BMJ Open
554 papers in training set
Top 10%
1.4%
20
Archives of Clinical and Biomedical Research
28 papers in training set
Top 1%
1.3%
21
BJGP Open
12 papers in training set
Top 0.5%
1.1%
22
Clinical Infectious Diseases
231 papers in training set
Top 4%
0.9%
23
PLOS Medicine
98 papers in training set
Top 4%
0.8%
24
Kidney360
22 papers in training set
Top 0.5%
0.8%
25
ERJ Open Research
44 papers in training set
Top 0.7%
0.8%
26
eLife
5422 papers in training set
Top 55%
0.8%
27
Journal of Antimicrobial Chemotherapy
43 papers in training set
Top 0.4%
0.8%
28
Frontiers in Public Health
140 papers in training set
Top 8%
0.8%
29
Cureus
67 papers in training set
Top 5%
0.8%
30
Eurosurveillance
80 papers in training set
Top 1%
0.8%