From claims to care: Machine learning algorithm to classify urinary tract infection cases using Swiss health insurance data
Aghlmandi, S.; Shafiezadeh, S.; Huber, C.; Godet, P.; Bucher, H. C.; Bielicki, J. A.
Show abstract
ObjectivesTo evaluate whether machine learning (ML) applied to comprehensive claims data without diagnostic codes can distinguish a high proportion of antibiotic treatment episodes as urinary tract infection (UTI) or non-UTI cases. Such approaches may be valuable for antimicrobial stewardship when diagnosis-linked datasets are unavailable. MethodsOutpatient antibiotic prescription claims from three major Swiss insurers (2017-2020; [~]40% of the Swiss population) were analyzed. Based on clinical input, specific constellations of claims codes (e.g. positive urine culture plus typical antibiotic) were a priori assigned as indicating UTI episodes, providing the reference classification. Predictors included sex, age group, comorbidity, and diagnostic tests ordered during the episode. Four ML classifiers were tested; performance and interpretability were evaluated, with XGBoost prioritized. ResultsAfter cleaning and balancing, 38,982 records (19,491 UTI; 19,491 non-UTI) were included. XGBoost achieved an AUC of 0.94, accuracy of 87.6%, sensitivity of 79.2%, and specificity of 96.1%. Misclassification was asymmetric: 11% of non-UTI cases were labeled UTI, while 2% of UTI cases were misclassified as non-UTI. Diagnostics ordered were the strongest predictors, followed by female sex and older age. ConclusionsEven in the absence of diagnosis codes, ML applied to claims data can reliably identify UTI-related prescriptions. This supports the feasibility of claims-based surveillance tools for stewardship, while in parallel highlighting the need for scalable, low-burden approaches to improve direct diagnostic coding in routine data.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.