Back

Simpler is not always better: Phylodynamic misspecification and deep-learning corrections

XIE, R.; Gascuel, O.; ZHUKOVA, A.

2026-05-08 epidemiology
10.64898/2026.05.07.26352661 medRxiv
Show abstract

Phylodynamics bridges the gap between epidemiology and pathogen genetic data by estimating epidemiological parameters from time-scaled pathogen phylogenies. Multi-type birth-death (MTBD) models are phylodynamic analogies of compartmental models in classical epidemiology. They serve to infer the average number of secondary infections R and the infection duration d. Moreover, more complex MTBD models add extra parameters, such as the average length of the incubation period or the proportion of superspreaders in the infected population. However, these additional parameters come at an important computational cost: Apart from the simplest, BD, model, MTBD models do not have a closed-form solution and require numerical methods for their likelihood computation. This leads to increased computational times and potential numerical errors. Therefore, the BD model remains the favorite researchers choice for real dataset analyses, and is often applied even in cases where more complex epidemiological aspects are present. We investigated, using simulations, how model misspecification influences inference of R and d in the phylodynamic framework. We showed that the use of models not accounting for various epidemiological aspects leads to bias. In particular the simplest, BD, estimator tends to underestimate R in the presence of super-spreading or incubation, which might be dangerous from the public health prospective. However, deep-learning-based estimators for complex models, which account for multiple epidemiological factors, perform well both on the data where those factors are present and where they are absent. This advocates for the use of complex epidemiologically realistic estimators, whose design has recently become possible thanks to deep learning.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 0.1%
42.5%
2
Epidemics
104 papers in training set
Top 0.2%
6.8%
3
Nature Communications
4913 papers in training set
Top 35%
4.6%
50% of probability mass above
4
Epidemiology and Infection
84 papers in training set
Top 0.4%
3.9%
5
Scientific Reports
3102 papers in training set
Top 32%
3.8%
6
PLOS ONE
4510 papers in training set
Top 41%
3.3%
7
Journal of The Royal Society Interface
189 papers in training set
Top 1%
3.1%
8
Statistics in Medicine
34 papers in training set
Top 0.1%
2.2%
9
eLife
5422 papers in training set
Top 37%
2.0%
10
Patterns
70 papers in training set
Top 0.6%
2.0%
11
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 36%
1.4%
12
Peer Community Journal
254 papers in training set
Top 3%
1.0%
13
PLOS Biology
408 papers in training set
Top 15%
1.0%
14
Bioinformatics
1061 papers in training set
Top 8%
1.0%
15
American Journal of Epidemiology
57 papers in training set
Top 1%
0.8%
16
Journal of Theoretical Biology
144 papers in training set
Top 2%
0.8%
17
Science Advances
1098 papers in training set
Top 29%
0.8%
18
International Journal of Epidemiology
74 papers in training set
Top 3%
0.7%
19
Swiss Medical Weekly
12 papers in training set
Top 0.4%
0.7%
20
BMC Infectious Diseases
118 papers in training set
Top 6%
0.7%
21
Epidemiology
26 papers in training set
Top 0.7%
0.5%
22
Infectious Disease Modelling
50 papers in training set
Top 1%
0.5%
23
The Annals of Applied Statistics
15 papers in training set
Top 0.1%
0.5%
24
Eurosurveillance
80 papers in training set
Top 2%
0.5%
25
Methods in Ecology and Evolution
160 papers in training set
Top 3%
0.5%
26
PLOS Genetics
756 papers in training set
Top 17%
0.5%
27
Viruses
318 papers in training set
Top 6%
0.5%