Back

Prognosis of stroke subtypes in whole population health systems data: a matched cohort study

Hosking, A.; Iveson, M. H.; Sherlock, L.; Mukherjee, M.; Grover, C.; Alex, B.; Parepalli, S.; Mair, G.; Doubal, F.; Whalley, H. C.; Tobin, R.; Wardlaw, J. M.; Al-Shahi Salman, R.; Whiteley, W. N.

2026-04-20 neurology

10.64898/2026.04.17.26351150 medRxiv

Show abstract

Background Outcome after stroke varies according to stroke subtype by location, but healthcare systems data studies do not include subtyping information. We linked natural language processing (NLP) of brain imaging reports to routinely collected data to estimate risk of death and other outcomes after stroke subtypes in a nationwide dataset. Methods We applied a previously validated NLP algorithm to all CT and MRI head scan reports in Scotland between 2010 and 2018. We linked the reports to hospital readmissions, prescriptions and death data to identify and characterize people with stroke, and to categorize into deep and cortical ischemic stroke, deep and lobar intracerebral hemorrhage (ICH), subarachnoid hemorrhage, and subdural hemorrhage. We used a matched cohort design, and age- and sex-matched four controls per case who never had a stroke. By subtype, we estimated rehospitalization with stroke, myocardial infarction (MI), cancer, dementia, epilepsy and death, accounting for confounders and competing risk of death. Results From 785,331 people with a head scan, we identified 64,219 with clinical stroke phenotypes (mean age 73.4yrs, 49.5% male), and subtyped 12,616 with deep ischaemic stroke; 14,103 with cortical ischaemic stroke; 1,814 with deep ICH; and 1,456 with lobar ICH. There was higher absolute rate of 1-year hospital readmission for lobar compared with deep ICH (4.9% [95%CI 3.9% - 6.1%] vs 3.4% [2.6% - 4.3%]), higher risk of dementia beyond 6 months after lobar ICH compared to controls than for other stroke subtypes (aHR 3.5 [2.3-5.3]); and higher risk of MI within 6 months of cortical ischemic stroke than for other stroke subtypes (aHR 4.6 [3.4-6.3]). Conclusions NLP of free-text reports linked to coded data successfully subtyped stroke at scale, and we estimated risk of clinically relevant outcomes. Future work should use free text to enable large-scale audit and epidemiology of people with stroke.

Prognosis of stroke subtypes in whole population health systems data: a matched cohort study

Matching journals