Evaluation of Protein Reference Database Reduction and Its Impact on Peptide-Centric Metaproteomics
Vande Moortele, T.; Van de Vyver, S.; Binke, B.-B.; Van Den Bossche, T.; Dawyndt, P.; Martens, L.; Mesuere, B.; Verschaffelt, P.
Show abstract
Introduction/BackgroundRecent large-scale restructurings of UniProtKB included removal of redundant entries, exclusion of taxonomically unclassified organisms, and a shift toward a more reference-proteome-centered approach. This raised concerns about the stability of peptide-centric metaproteomics workflows. In parallel, metagenomics-assisted "targeted" database restriction is often proposed to reduce ambiguity, but its net impact on peptide-centric interpretation remains unclear. MethodsWe assessed the impact of three complementary factors on the taxonomic profiling of metaproteomics analyses: (i) successive global UniProtKB reductions, (ii) metagenomics-derived targeted database restriction, and (iii) Unipepts internal taxon validation filter. Peptide lists from two public metaproteomics datasets (human gut and marine hatchery) were analysed with Unipept and compared across sequential UniProtKB configurations and custom SSU/LSU-derived filtered databases. ResultsAcross both environments, progressive UniProtKB downsizing reduced peptide coverage, did not fundamentally alter the most abundant taxa, and substantially lowered ambiguous root-level assignments. This suggests that the reduction in ambiguity stemmed from decreased redundancy, rather than a loss of meaningful biological information. Metagenomics-assisted targeted filtering introduced a clear trade-off: it markedly reduced peptide matches, but with only modest changes in resolution at lower taxonomic ranks. It, however, consistently reduced non-specific root-level assignments. The effects on taxon discoverability and relative abundances was heavily dependent on the environment, with stronger shifts observed in the, lesser represented, marine dataset. Finally, the added benefit of Unipepts internal taxon validation filter decreased across newer, more curated database configurations. It had the largest impact on older, more inclusive releases and became minimal under the reference-proteome-focused setup. Discussion/ConclusionOverall, UniProtKB restructuring does not destabilize peptide-centric metaproteomic analyses. Instead, it tends to reduce ambiguity while preserving high-level community structure. Targeted database restriction offers a trade-off between sensitivity and reduced ambiguity in a strongly context-dependent manner. As UniProtKB becomes increasingly more curated and reference-proteome-centered, the need for additional internal taxonomic filtering in Unipept appears to diminish.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.