Back

Detecting simulated pathogen releases in a real-world health data set

Moss, R.; Testolin, M. J.; Pitsaris, C.; Hill, A. M.; Muscatello, D. J.; McCaw, J. M.; Dawson, P.

2026-05-17 infectious diseases
10.64898/2026.05.12.26350999 medRxiv
Show abstract

The purpose of electronic disease syndromic surveillance (EDSyS) systems is to detect hazardous pathogens and other unusual signals in health surveillance data before such events are identified by an individual clinician or healthcare facility. However, EDSyS systems have primarily been evaluated using simulated health surveillance data, which do not necessarily capture the richness and complexities of real-world health data. We have updated and extended an existing EDSyS system, EpiDefend, which combines ensemble forecasting and recursive Bayesian estimation in a particle filter framework that supports demographic and spatial structure. We simulated the release of several pathogens, both infectious and non-infectious, and injected the resulting cases into a real-world health data set. Here we evaluate EpiDefend's sensitivity and specificity in detecting these simulated releases, and measure the time to detection against pathogen-specific estimates of the time to clinical detection, as informed by clinicians and microbiologists. We show that for diseases where clinical diagnosis can be challenging, such as Q fever (Coxiella burnetii) and tularaemia (Francisella tularensis), EpiDefend can reliably beat the time to clinical detection. In contrast, for pathogens that can be clinically diagnosed relatively quickly, such as inhalational anthrax and pneumonic plague, it is extremely difficult to beat the time to clinical detection. Our results suggest that EpiDefend may be able to reliably detect real-world introductions or releases of some pathogens at low false-alarm rates before a clinical diagnosis would be confirmed, and this would represent a landmark achievement for EDSyS systems.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 1.0%
19.0%
2
Epidemics
104 papers in training set
Top 0.1%
7.3%
3
Nature Medicine
117 papers in training set
Top 0.3%
6.5%
4
PLOS ONE
4510 papers in training set
Top 27%
6.4%
5
Clinical Infectious Diseases
231 papers in training set
Top 1%
4.7%
6
Scientific Reports
3102 papers in training set
Top 26%
4.4%
7
npj Digital Medicine
97 papers in training set
Top 1%
4.4%
50% of probability mass above
8
Nature Communications
4913 papers in training set
Top 37%
4.0%
9
Science Translational Medicine
111 papers in training set
Top 1%
2.8%
10
eLife
5422 papers in training set
Top 31%
2.8%
11
Nature Computational Science
50 papers in training set
Top 0.3%
2.4%
12
BMC Infectious Diseases
118 papers in training set
Top 2%
1.8%
13
Patterns
70 papers in training set
Top 1%
1.5%
14
PLOS Biology
408 papers in training set
Top 12%
1.4%
15
PLOS Pathogens
721 papers in training set
Top 7%
1.1%
16
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 40%
1.0%
17
Journal of Infection
71 papers in training set
Top 2%
0.9%
18
Journal of Medical Internet Research
85 papers in training set
Top 4%
0.9%
19
Science Advances
1098 papers in training set
Top 26%
0.9%
20
Communications Medicine
85 papers in training set
Top 0.7%
0.9%
21
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.9%
22
Infectious Disease Modelling
50 papers in training set
Top 1%
0.8%
23
PLOS Digital Health
91 papers in training set
Top 3%
0.8%
24
Genome Medicine
154 papers in training set
Top 8%
0.8%
25
Cell Reports Methods
141 papers in training set
Top 5%
0.8%
26
iScience
1063 papers in training set
Top 31%
0.8%
27
Frontiers in Microbiology
375 papers in training set
Top 10%
0.7%
28
Frontiers in Public Health
140 papers in training set
Top 9%
0.7%
29
Emerging Infectious Diseases
103 papers in training set
Top 4%
0.5%
30
PLOS Global Public Health
293 papers in training set
Top 6%
0.5%