ECLIPSE: Exploring the dark proteome of ESKAPE pathogens through the sequence similarity network of the Protein Universe Atlas

Lata, S.; Heinz, D. W.

2026-04-01 bioinformatics

10.64898/2026.03.30.715302 bioRxiv

Show abstract

MotivationThe accelerating crisis of antimicrobial resistance among the critical ESKAPE pathogens demands the urgent identification of novel molecular targets. However, a substantial fraction of bacterial proteomes remains functionally uncharacterized, with many genes annotated as encoding hypothetical proteins. These protein sequences often lack significant similarity to known protein families when using conventional homology-based annotation methods and thus remain "dark". This limits our ability to explore their role in pathogenicity, and it is thus crucial to bridge this substantial gap in pathogen biology by developing novel strategies to illuminate these "dark" regions of the ESKAPE panproteomes. ResultsWe introduce ECLIPSE (ESKAPE Connectome Linkage and Inference for Proteome Sequence Exploration), a network-based computational framework that systematically identifies and prioritises functionally uncharacterised protein families in bacterial panproteomes. ECLIPSE embeds target pathogen proteomes within the global sequence similarity network of the Protein Universe Atlas and detects connected components composed entirely of unannotated proteins called "dark proteome". As a case study, we have applied ECLIPSE to a panproteome of 3,460,657 protein sequences from 635 different strains of Pseudomonas aeruginosa. ECLIPSE identified 120,985 proteins (4%) residing in completely dark connected components. Further we have used taxonomic diversity analysis using normalised Shannon indices to characterise each dark component by its enrichment in ESKAPE pathogens using evenness (E) value which distinguishes Pseudomonas-specific from ESKAPE-enriched dark components. The Dark Proteome Prioritisation Score (DPPS), a composite multi-dimensional scoring framework, ranked these candidates by biological relevance across four orthogonal axes (i) functional darkness, (ii) P. aeruginosa proportion in Atlas, (iii) AMR-clade taxonomic restriction, and (iv) conservation across 635 P. aeruginosa strains, which outputs four robustly Tier scoring based components and the prioritised Tier I components were validated with weight sensitivity analysis which was stable across 500 Monte Carlo weight perturbations. Structural characterisation of the one of the top ranked ESKAPE-enriched candidate revealed a novel beta-barrel fold belonging to the DUF1302 family with no experimentally characterised structural homologue in the PDB and it was co-localised with a LuxR type transcriptional regulator in conserved gene neighbourhoods across multiple P. aeruginosa strains. Collectively, ECLIPSE identifies evolutionarily conserved, structurally defined, and functionally uncharacterised proteins enriched across ESKAPE pathogens which can facilitate experimental characterisation of these dark proteins as potential antimicrobial targets. Availability and implementationThe source code and dataset are available for free at https://github.com/surabhilata/ECLIPSE.git

ECLIPSE: Exploring the dark proteome of ESKAPE pathogens through the sequence similarity network of the Protein Universe Atlas

Matching journals