Back

ECLIPSE: Exploring the dark proteome of ESKAPE pathogens through the sequence similarity network of the Protein Universe Atlas

Lata, S.; Heinz, D. W.

2026-04-01 bioinformatics
10.64898/2026.03.30.715302 bioRxiv
Show abstract

MotivationThe accelerating crisis of antimicrobial resistance among the critical ESKAPE pathogens demands the urgent identification of novel molecular targets. However, a substantial fraction of bacterial proteomes remains functionally uncharacterized, with many genes annotated as encoding hypothetical proteins. These protein sequences often lack significant similarity to known protein families when using conventional homology-based annotation methods and thus remain "dark". This limits our ability to explore their role in pathogenicity, and it is thus crucial to bridge this substantial gap in pathogen biology by developing novel strategies to illuminate these "dark" regions of the ESKAPE panproteomes. ResultsWe introduce ECLIPSE (ESKAPE Connectome Linkage and Inference for Proteome Sequence Exploration), a network-based computational framework that systematically identifies and prioritises functionally uncharacterised protein families in bacterial panproteomes. ECLIPSE embeds target pathogen proteomes within the global sequence similarity network of the Protein Universe Atlas and detects connected components composed entirely of unannotated proteins called "dark proteome". As a case study, we have applied ECLIPSE to a panproteome of 3,460,657 protein sequences from 635 different strains of Pseudomonas aeruginosa. ECLIPSE identified 120,985 proteins (4%) residing in completely dark connected components. Further we have used taxonomic diversity analysis using normalised Shannon indices to characterise each dark component by its enrichment in ESKAPE pathogens using evenness (E) value which distinguishes Pseudomonas-specific from ESKAPE-enriched dark components. The Dark Proteome Prioritisation Score (DPPS), a composite multi-dimensional scoring framework, ranked these candidates by biological relevance across four orthogonal axes (i) functional darkness, (ii) P. aeruginosa proportion in Atlas, (iii) AMR-clade taxonomic restriction, and (iv) conservation across 635 P. aeruginosa strains, which outputs four robustly Tier scoring based components and the prioritised Tier I components were validated with weight sensitivity analysis which was stable across 500 Monte Carlo weight perturbations. Structural characterisation of the one of the top ranked ESKAPE-enriched candidate revealed a novel beta-barrel fold belonging to the DUF1302 family with no experimentally characterised structural homologue in the PDB and it was co-localised with a LuxR type transcriptional regulator in conserved gene neighbourhoods across multiple P. aeruginosa strains. Collectively, ECLIPSE identifies evolutionarily conserved, structurally defined, and functionally uncharacterised proteins enriched across ESKAPE pathogens which can facilitate experimental characterisation of these dark proteins as potential antimicrobial targets. Availability and implementationThe source code and dataset are available for free at https://github.com/surabhilata/ECLIPSE.git

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 2%
18.4%
2
mSystems
361 papers in training set
Top 0.7%
10.0%
3
PLOS Computational Biology
1633 papers in training set
Top 4%
8.3%
4
Journal of Proteome Research
215 papers in training set
Top 0.6%
4.8%
5
Microbial Genomics
204 papers in training set
Top 0.6%
3.6%
6
Nature Communications
4913 papers in training set
Top 40%
3.6%
7
Genome Biology
555 papers in training set
Top 3%
2.7%
50% of probability mass above
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.6%
9
Scientific Reports
3102 papers in training set
Top 50%
2.1%
10
mSphere
281 papers in training set
Top 3%
2.1%
11
Nucleic Acids Research
1128 papers in training set
Top 9%
1.9%
12
BMC Bioinformatics
383 papers in training set
Top 4%
1.8%
13
PLOS ONE
4510 papers in training set
Top 54%
1.7%
14
GigaScience
172 papers in training set
Top 1%
1.7%
15
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
16
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.6%
17
Frontiers in Microbiology
375 papers in training set
Top 6%
1.5%
18
Communications Biology
886 papers in training set
Top 13%
1.3%
19
Patterns
70 papers in training set
Top 1%
1.3%
20
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.2%
21
Genomics
60 papers in training set
Top 2%
0.9%
22
iScience
1063 papers in training set
Top 25%
0.9%
23
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.9%
24
Genome Medicine
154 papers in training set
Top 7%
0.9%
25
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.9%
26
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 5%
0.8%
27
Frontiers in Immunology
586 papers in training set
Top 7%
0.8%
28
Microbiome
139 papers in training set
Top 3%
0.7%
29
Molecular Systems Biology
142 papers in training set
Top 2%
0.7%
30
Microbiology Spectrum
435 papers in training set
Top 6%
0.7%