Back

Hidden in Plain Sight: Epidemiological Signals in Routine Laboratory Data

Hoffmann, T.; Mugahid, D.; Olejarz, J.; Neale, A.; Zapf, A.; Molinaro, R.; Lipsitch, M.; Atun, R.; Grad, Y.; Fortune, S.; Sampath, R.; Onnela, J.-P.

2026-02-06 public and global health
10.64898/2026.02.05.26345657 medRxiv
Show abstract

Public health monitoring traditionally relies on active reporting from diverse data sources, including clinical and administrative data, disease registries, and population-based surveys. Yet these surveillance methods often face challenges such as incomplete reporting, time lags, and variable population coverage. Meanwhile, diagnostic laboratories routinely generate vast volumes of operational data that are currently untapped for public health monitoring. As these data are not collected for scientific inquiry or population-level surveillance, they often lack formal validation and may contain sensitive information. We developed a Bayesian hierarchical model to decompose aggregated laboratory assay volume data for 1.1 billion clinician-ordered assays across the U.S. from October 2019 to March 2023 into interpretable epidemiological and health system signals. The signals generated by these models were compared with known perturbances to health systems, such as the COVID-19 pandemic. The method does not rely on assay outcomes or individual-level data, providing quantitative signals of epidemiological trends and health system responses while protecting both the privacy of patients and commercially sensitive information. Temporal analysis reveals qualitatively different responses of assay volumes to major public health events, identifying assays whose use paralleled surges in hospitalization rates during the COVID-19 pandemic documented through traditional public health reporting structures. This framework suggests that routine operational data can be used to augment traditional surveillance by identifying anomalous patterns for expert epidemiological investigation. To be truly effective, data from multiple vendors must be integrated to create a comprehensive real-time national or supranational public health surveillance platform.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
eLife
5422 papers in training set
Top 0.1%
28.3%
2
npj Digital Medicine
97 papers in training set
Top 0.2%
23.0%
50% of probability mass above
3
Nature Communications
4913 papers in training set
Top 22%
8.6%
4
Patterns
70 papers in training set
Top 0.1%
4.3%
5
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.7%
3.9%
6
PLOS ONE
4510 papers in training set
Top 46%
2.5%
7
Genome Medicine
154 papers in training set
Top 3%
2.5%
8
Scientific Reports
3102 papers in training set
Top 52%
1.9%
9
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 30%
1.8%
10
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
11
PLOS Biology
408 papers in training set
Top 9%
1.7%
12
Cell
370 papers in training set
Top 11%
1.7%
13
Epidemics
104 papers in training set
Top 1%
1.4%
14
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.4%
15
Molecular Systems Biology
142 papers in training set
Top 0.8%
1.4%
16
Science Advances
1098 papers in training set
Top 26%
0.9%
17
Nature Medicine
117 papers in training set
Top 5%
0.7%
18
Nature Computational Science
50 papers in training set
Top 2%
0.7%
19
Communications Biology
886 papers in training set
Top 25%
0.7%
20
Journal of Biomedical Informatics
45 papers in training set
Top 2%
0.7%
21
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.7%
22
JMIR Medical Informatics
17 papers in training set
Top 2%
0.7%
23
The Lancet Digital Health
25 papers in training set
Top 2%
0.5%