Back

Comprehensive interaction profiling and machine learning prediction of bacteriophage infectivity across clinically diverse Pseudomonas aeruginosa

Piya, D.; Noonan, A. J. C.; Selvakumar, H.; Alayouni, M.; Koderi Valappil, S.; Maucourt, F.; Murray, I.; Svab, M.; Bousliman, C.; Heidenblut, M.; Orihuela, B.; Kazakov, A.; Carlson, H.; Yao, Y.; Smith, E.; Roux, S.; Deutschbauer, A.; Inman, J.; Arkin, A. P.; Mutalik, V. K.

2026-05-20 microbiology
10.64898/2026.05.19.726084 bioRxiv
Show abstract

The rise of antibiotic-resistant bacterial infections has driven renewed interest in bacteriophage therapy, where viruses that specifically kill bacteria are used as targeted antimicrobials. Pseudomonas aeruginosa, a WHO critical-priority pathogen that causes severe infections in hospitalized and immunocompromised patients, presents a major challenge for phage therapy because of its extraordinary genetic diversity. Phages effective against one bacterial strain often fail against others, and existing cross-resistance-profiling approaches require iterative empirical testing of each new patient isolate. To establish a genome-based framework for rapid phage-isolate matching, we assembled a collection of 95 genomically diverse P. aeruginosa phages representing 20 genera and tested each against 99 genetically diverse clinical isolates, generating 9,405 infection outcome measurements. Bacterial O-antigen serotype emerged as the dominant determinant of strain susceptibility, while defense systems, anti-defense systems, and prophage burden contributed smaller strain-specific effects. The full curated multivariate model explained 47% of strain-susceptibility variance. Machine-learning models integrating these features and pangenome-derived gene clusters reached a per-strain AUROC of 0.86. In an in vivo proof-of-concept test against a single held-out strain, the ML-designed cocktail produced a [~]12-fold greater median CFU reduction than the expert-designed cocktail (q = 0.045), with both cocktails substantially reducing burden relative to the untreated control ([~]113-fold for ML, [~]9-fold for CG; both q < 10{square}3). SHAP analysis of the model identified bacterial surface-architecture genes (LPS biosynthesis, outer membrane proteins, type IV pili) as the dominant predictors, with defense-system content modulating which specific phages succeed against a strain rather than uniformly damping susceptibility. Together, these results establish a genome-based framework for predicting phage susceptibility in genetically diverse clinical isolates.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Cell Host & Microbe
113 papers in training set
Top 0.2%
13.9%
2
Cell Systems
167 papers in training set
Top 0.9%
12.1%
3
Nature Microbiology
133 papers in training set
Top 0.1%
9.8%
4
Nature Communications
4913 papers in training set
Top 20%
9.8%
5
Science
429 papers in training set
Top 7%
4.7%
50% of probability mass above
6
Nature
575 papers in training set
Top 6%
4.2%
7
Genome Medicine
154 papers in training set
Top 2%
3.8%
8
Cell Reports
1338 papers in training set
Top 16%
3.5%
9
Cell
370 papers in training set
Top 7%
3.5%
10
eLife
5422 papers in training set
Top 34%
2.3%
11
mBio
750 papers in training set
Top 6%
2.0%
12
Cell Reports Medicine
140 papers in training set
Top 3%
2.0%
13
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 29%
2.0%
14
Science Translational Medicine
111 papers in training set
Top 2%
1.8%
15
Science Advances
1098 papers in training set
Top 16%
1.8%
16
Cell Genomics
162 papers in training set
Top 3%
1.7%
17
Nature Biotechnology
147 papers in training set
Top 5%
1.6%
18
Nature Ecology & Evolution
113 papers in training set
Top 3%
1.6%
19
Nature Genetics
240 papers in training set
Top 5%
1.6%
20
PLOS Biology
408 papers in training set
Top 12%
1.4%
21
Nature Medicine
117 papers in training set
Top 3%
1.4%
22
mSystems
361 papers in training set
Top 6%
0.9%
23
Immunity
58 papers in training set
Top 4%
0.8%
24
Nucleic Acids Research
1128 papers in training set
Top 19%
0.7%
25
Microbiome
139 papers in training set
Top 3%
0.7%
26
Molecular Cell
308 papers in training set
Top 11%
0.7%
27
Advanced Science
249 papers in training set
Top 23%
0.6%