Back

Deep Learning-Based Structure Modeling of the Treponema pallidum Proteome: Insights into Pathogenesis and Syphilis Vaccine Development

Houston, S.; Marshall, S.; Miller, A.; Palkowski, A.; Alfaro, J.; Cameron, C.

2026-05-07 microbiology
10.64898/2026.05.05.717303 bioRxiv
Show abstract

Treponema pallidum ssp. pallidum, the causative agent of syphilis, has a small proteome and encompasses numerous strains. Knowledge gaps remain in understanding the molecular mechanisms of pathogenesis of this bacterium, as well as the structure and function of the full complement of proteins encoded by T. pallidum. Here, an AI-based structure-to-function modeling workflow was used to investigate the complement of proteins encoded by T. pallidum. High-confidence structure models were generated for 976 T. pallidum proteins, covering 99% of the proteome. Analysis of the generated models using the protein structure comparison server DALI enabled high-confidence, structure-based functional annotation of 877 T. pallidum proteins, including 240 of the 323 proteins of unknown function encoded by this pathogen. Additionally, 63 putative pathogenesis related proteins (PPRPs) and seven treponemal proteins with previously uncharacterized similarity to outer membrane proteins (OMPs) from Gram-negative bacteria were identified. A workflow for B cell epitope (BCE) prediction identified 1133 surface-exposed, host-facing potential epitopes in known and predicted T. pallidum OMPs, of which 92 were prioritized based on bioinformatic analyses, biophysical properties, amino acid sequence conservation, and previous protein expression data. This work provides insight into T. pallidum pathogenesis through structure modeling-based functional annotation, including characterization of proteins of unknown function. This study also informs syphilis vaccine design by identifying new potential T. pallidum OMPs, as well as host-facing regions of T. pallidum OMPs that have conserved amino acid sequences in globally circulating strains. Statement of importance/impactThis study presents the first AI-based global structure modeling-to-function analysis of the proteome of Treponema pallidum, the bacterium that causes syphilis. Structure-based functional predictions of previously uncharacterized proteins, including proteins potentially involved in virulence, provide novel insight into mechanisms of pathogenesis. The work also informs syphilis vaccine development by the identification and structural characterization of new candidate vaccine proteins in globally circulating strains of T. pallidum.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 0.1%
10.3%
2
PLOS Neglected Tropical Diseases
378 papers in training set
Top 0.9%
10.3%
3
mSystems
361 papers in training set
Top 1%
7.3%
4
PLOS Computational Biology
1633 papers in training set
Top 4%
7.3%
5
Frontiers in Immunology
586 papers in training set
Top 1%
4.9%
6
mBio
750 papers in training set
Top 3%
4.9%
7
Virulence
21 papers in training set
Top 0.1%
4.4%
8
Frontiers in Microbiology
375 papers in training set
Top 2%
4.0%
50% of probability mass above
9
Microbiology Spectrum
435 papers in training set
Top 0.7%
3.7%
10
Journal of Clinical Microbiology
120 papers in training set
Top 0.5%
3.6%
11
Scientific Reports
3102 papers in training set
Top 40%
3.1%
12
Microbial Genomics
204 papers in training set
Top 0.7%
2.8%
13
PLOS Pathogens
721 papers in training set
Top 5%
2.4%
14
mSphere
281 papers in training set
Top 2%
2.1%
15
Journal of Infection
71 papers in training set
Top 1.0%
1.9%
16
ACS Infectious Diseases
74 papers in training set
Top 0.6%
1.7%
17
Frontiers in Medicine
113 papers in training set
Top 4%
1.5%
18
PLOS ONE
4510 papers in training set
Top 57%
1.4%
19
Pathogens
53 papers in training set
Top 0.8%
1.2%
20
PLOS Biology
408 papers in training set
Top 13%
1.2%
21
PeerJ
261 papers in training set
Top 11%
1.0%
22
Infection, Genetics and Evolution
43 papers in training set
Top 0.6%
1.0%
23
Nature Communications
4913 papers in training set
Top 60%
0.9%
24
BMC Genomics
328 papers in training set
Top 4%
0.9%
25
Toxins
14 papers in training set
Top 0.1%
0.9%
26
Emerging Microbes & Infections
74 papers in training set
Top 1%
0.8%
27
npj Biofilms and Microbiomes
56 papers in training set
Top 2%
0.8%
28
eLife
5422 papers in training set
Top 59%
0.7%
29
Microbial Pathogenesis
13 papers in training set
Top 0.9%
0.5%
30
Infection and Immunity
103 papers in training set
Top 0.9%
0.5%