Back

Interpreting the WaveSeekerNet model to reveal the evolution and biology of influenza A virus

Nguyen, H.-H.; Rudar, J.; Mubareka, S.; Lapen, D.; Berhane, Y.; Leung, C. K.; Lung, O.

2026-05-25 genomics
10.64898/2026.05.23.726879 bioRxiv
Show abstract

BackgroundInfluenza A virus (IAV) is a major public health burden, causing seasonal epidemics and occasional pandemics. Its transmission from avian species to mammals and subsequent spread requires adaptive changes in the viral genome. Understanding these molecular adaptations is essential for pandemic preparedness, and machine learning offers a powerful approach to uncover the evolution and biology of IAV. ResultsOur calibrated WaveSeekerNet model accurately predicted the host source of 8 IAV segments (Macro F1-score: 0.9728), significantly improving the reliability of predicted probabilities, with calibration errors approaching zero. Interpretation showed that avian-adapted IAVs consistently activated G/C content, whereas mammalian-adapted IAVs generally activated A/T content. This distinction was confirmed by codon-level analysis, in which G/C-rich codons were rewarded for the avian hosts and A/T-rich codons for the mammalian hosts. We defined host-adaptive distance to quantify species barriers and proposed it as a risk-assessment metric. We hypothesized the Mammalian Adaptation Zone (MAZ), a zone where the virus is expected to adjust its host-adaptive distance to reach, thereby helping it establish persistent mammalian lineages. The analysis also revealed the Hard Distance of avian-origin viruses (e.g., H5Nx, H9N2), indicating they have not yet established persistent mammalian lineages. Finally, analysis of human H7N9 (2013, China) and non-human mammalian H5Nx (North America) viruses showed that WaveSeekerNet accurately identified key mammalian-adaptive mutations, including PB2-E627K and PB2-D701N. ConclusionsWaveSeekerNet elucidated IAV host-adaptation mechanisms in silico, providing insights into the underlying mechanisms of host adaptation and informing improved surveillance and intervention strategies.

Matching journals

The top 13 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 1%
18.6%
2
Scientific Reports
3102 papers in training set
Top 24%
4.8%
3
Computational and Structural Biotechnology Journal
216 papers in training set
Top 1%
4.3%
4
Virus Evolution
140 papers in training set
Top 0.4%
3.6%
5
Viruses
318 papers in training set
Top 2%
3.6%
6
PLOS Biology
408 papers in training set
Top 5%
2.6%
7
PLOS ONE
4510 papers in training set
Top 46%
2.4%
8
Bioinformatics
1061 papers in training set
Top 7%
2.1%
9
Journal of Virology
456 papers in training set
Top 2%
2.1%
10
Communications Biology
886 papers in training set
Top 5%
2.1%
11
eLife
5422 papers in training set
Top 38%
1.9%
12
Nature Communications
4913 papers in training set
Top 50%
1.8%
13
Emerging Microbes & Infections
74 papers in training set
Top 0.7%
1.8%
50% of probability mass above
14
GigaScience
172 papers in training set
Top 1%
1.7%
15
Frontiers in Immunology
586 papers in training set
Top 4%
1.7%
16
BMC Bioinformatics
383 papers in training set
Top 4%
1.7%
17
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
18
PeerJ
261 papers in training set
Top 7%
1.7%
19
iScience
1063 papers in training set
Top 16%
1.7%
20
PLOS Pathogens
721 papers in training set
Top 6%
1.5%
21
Journal of Infection
71 papers in training set
Top 1%
1.5%
22
Frontiers in Public Health
140 papers in training set
Top 5%
1.5%
23
Virus Research
36 papers in training set
Top 0.6%
1.5%
24
Frontiers in Genetics
197 papers in training set
Top 6%
1.3%
25
BMC Infectious Diseases
118 papers in training set
Top 3%
1.3%
26
PNAS Nexus
147 papers in training set
Top 0.5%
1.3%
27
Patterns
70 papers in training set
Top 1%
1.2%
28
mSystems
361 papers in training set
Top 6%
1.2%
29
Frontiers in Microbiology
375 papers in training set
Top 7%
1.2%
30
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
0.9%