Back

PROTEOMICS

Wiley

Preprints posted in the last 30 days, ranked by how well they match PROTEOMICS's content profile, based on 35 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
Systems-Informed prioritization of Exosomal Protein Candidates in TNBC Identifies an ECM Invasion Module and Nominates Agrin as a High-Priority Target

Nguyen, T. M.

2026-05-19 cancer biology 10.64898/2026.05.14.725271 medRxiv
Top 0.1%
12.6%
Show abstract

BackgroundTriple-negative breast cancer (TNBC) remains the most clinically challenging breast cancer subtype, in part due to the absence of validated molecular targets and the limited availability of non-invasive early detection strategies. Tumor-derived exosomes have emerged as promising liquid biopsy analytes, yet the functional organization of their protein cargo and the identification of biologically meaningful candidates remain incompletely characterized. MethodsWe present a Composite Driver Score (CDS) framework that integrates differential expression magnitude with protein-protein interaction network topology and Analytic Hierarchy Process (AHP)-based multi-criteria weighting to prioritize exosomal protein candidates in a systems-informed manner. The framework was applied to publicly available label-free quantitative proteomic datasets comparing MDA-MB-231 (TNBC) and MCF-10A (non-tumorigenic) exosomal fractions, with cross-dataset validation performed on an independent proteomic dataset. ResultsCDS prioritization demonstrated robustness to variations in proteome depth and parameter weighting, consistently recovering a functionally coherent set of extracellular matrix (ECM) and adhesion-associated proteins. Network and pathway analyses revealed coordinated co-enrichment of integrin receptors, cognate ECM ligands, and associated co-receptors -- consistent with selective packaging of a functionally integrated invasion module. Agrin (AGRN), a heparan sulfate proteoglycan with virtually limited prior characterization in TNBC exosome biology, emerged as a high-priority candidate through its network integration within this ECM program. ConclusionsThese findings support a model in which TNBC-derived exosomes carry coordinated molecular programs capable of modulating extracellular matrix architecture. The CDS framework offers a transferable strategy for integrative exosomal biomarker prioritization and a systems-level foundation for targeted liquid biopsy panel development.

2
De-N-glycosylation of in vivo and in vitro adipogenic stem cell products unmasks differential expression of CD36 glycoprotein in human adipogenesis

Wongtrakul-Kish, K.; Herbert, B. R.; Haynes, P. A.; Packer, N. H.

2026-05-05 cell biology 10.64898/2026.05.01.722121 medRxiv
Top 0.1%
7.3%
Show abstract

Adipogenesis is the process of adipose-derived stem cells (ADSCs) responding to extracellular signals from the stem cell niche to differentiate into adipocytes (fat cells) and may be studied in vitro using a cocktail of chemicals that promote adipogenic differentiation to produce differentiated ADSCs (dADSCs). The global membrane N- and O-glycosylation changes of this process have been previously analysed and compared to native adipocytes as a benchmark for a true adipocyte profile, and revealed that bisecting GlcNAc type N-glycans are characteristic of adipogenesis. As stem cell differentiation has been widely reported to result in cellular protein changes, the same cells (ADSCs, dADSCs and mature adipocytes) were characterised for their membrane proteome here using label-free quantitative shotgun proteomics analysis. The membrane proteome displayed more differences in protein numbers between the cell types compared to the previously reported N-glycome which had shown high identical glycomes between stem cells and in vitro dADSCs, suggesting that the proteome is more dynamic during in vitro adipogenesis. Following the global shotgun proteomics analysis, a more targeted approach of carrying out proteomic analysis of de-N-glycosylated peptides of gel-separated proteins unearthed new glycoproteins not detected in the shotgun proteomic analysis. This approach identified the adipogenic marker, CD36, to be under-represented in the shotgun proteome analysis, but as the dominant (glyco)protein in the adipocyte membrane proteome that was also up-regulated at the mRNA transcript level in both the in vitro differentiated ADSCs (7.1-fold increase) and mature adipocytes (102.9-fold increase). A comparison of CD36 sequence coverage in the global shotgun analysis with the de-N-glycosylated CD36 revealed a 41% increase when N-glycans were removed prior to trypsin digestion, explaining its observed increased abundance and highlights the crucial need for de-N-glycosylation of proteins in proteomics experiments for increased identification of glycoproteins. The systems glycobiology approach by the integration of previously reported glycomics data and the proteomics and transcriptomics analyses in this work extended the investigation of membrane protein glycosylation changes in adipose-derived stem cell differentiation. The work provides a framework for future glycoproteomics-based investigations into the differentiation of stem cells into adipocytes, and will allow their related pathologies and potential therapeutic applications to be discovered. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=121 SRC="FIGDIR/small/722121v1_ufig1.gif" ALT="Figure 1"> View larger version (44K): org.highwire.dtl.DTLVardef@189a786org.highwire.dtl.DTLVardef@5563b8org.highwire.dtl.DTLVardef@5cb5borg.highwire.dtl.DTLVardef@69e11f_HPS_FORMAT_FIGEXP M_FIG C_FIG

3
Trypsin exhibits exopeptidase-like activity toward N-terminal arginine that biases proteomic analyses

Ambrose, E. A.; Kandasamy, G.; Meulener, M. M.; Zhang, F.

2026-05-16 biochemistry 10.64898/2026.05.15.725550 medRxiv
Top 0.1%
5.0%
Show abstract

Many proteomics protocols rely on enzymatic digestion of complex protein mixtures to generate peptides with predictable cleavage patterns for the mass spectrometry analysis. One of the most utilized enzymes, trypsin, is classically defined as a serine endopeptidase with high specificity for cleaving peptide bonds on the C-terminal side of internal lysine and arginine residues. Accordingly, trypsin is not expected to remove the N-terminal arginine, which may arise through posttranslational modification such as arginylation or by proteolysis exposing internal residues as the new N-termini. N-terminal arginine plays important biological roles, including functioning as an N-degron and modulating protein interactions/signaling through its positive charge. Curiously, prior mass spectrometry-based studies utilizing trypsin to identify proteins bearing N-terminal arginine have frequently reported low and inconsistent yields, suggesting potential systematic bias in current proteomic approaches. Here, we explored whether trypsin would affect the integrity of the N-terminal arginine. By using antibodies specifically recognizing N-terminal arginine of different peptides, and by using mass spectrometry peptide analysis, we show that trypsin can remove N-terminal arginine residues in an exopeptidase-like manner. This effect occurs across a range of digestion conditions consistent with standard proteomic workflows, on peptides or whole proteins, and depends on trypsin concentration, incubation time, and catalytic activity. In addition, we show that the alternative arginine-cleavage enzyme Arg-C can also affect N-terminal arginine in a sequence-dependent context. In contrast, Lys-C and LysargiNase do not exhibit such effects, providing suitable alternative digestion strategies. Together, these findings reveal an unappreciated enzymatic behavior of arginine-cleaving proteases and suggest that their widespread use may systematically compromise the detection of N-terminal arginine in proteomic studies.

4
LAMPrEY: a Python-based automated quality control tool for large-scale proteomics datasets

Valdes-Tresanco, M. E.; Wacker, S.; Valdes-Tresanco, M. S.; Plakhotnyk, A.; Brodie, N. I.; Hepburn, M.; Ulke-Lemee, A.; Huttlin, E. L.; Lewis, I. A.

2026-05-11 bioinformatics 10.64898/2026.05.06.722826 medRxiv
Top 0.1%
4.4%
Show abstract

Over the past years, proteomics has moved increasingly towards the analysis of large cohorts of biological specimens. This has been made possible by significant improvements in mass spectrometry technology, chromatographic separation methods, and improved data acquisition strategies. These technological advances now routinely enable experiments that yield vast datasets that substantially outstrip the capacity of existing proteomics data analysis approaches. Processing such large datasets requires purpose-built, quality control tools designed to organize and analyze the data while recording all processing parameters for reproducibility. To address this need, we developed an open-source, Python-based software platform, Large-scale Automated Multi-level Proteomics Evaluation by Python (LAMPrEY), a comprehensive quality-control pipeline for quantitative proteomics analyses of large cohorts of samples. LAMPrEY features GUI-based file submission, automated processing with MaxQuant and RawTools, an interactive analytics dashboard, and an application programming interface (API) for programmatic usage that collectively enable rapid, reproducible analysis and interpretation of proteomics data. We demonstrate the longitudinal monitoring and analytical capabilities of LAMPrEY using TMT11 quantitative proteomics data generated from 910 Enterococcus faecium isolates collected from bloodstream infection patients. LAMPrEY is an open-source software that can be accessed at www.lewisresearchgroup.org/software.

5
Reference-Based Library Construction Improves Performance in low-input diaPASEF Workflows

Charkow, J.; Ghaznavi, M.; Seale, B.; Peng, J.; Gingras, A.-C.; Rost, H.

2026-05-04 bioinformatics 10.64898/2026.04.29.721088 medRxiv
Top 0.1%
4.0%
Show abstract

In low input mass spectrometry-based proteomics, Data Independent Acquisition (DIA), including diaPASEF, is quickly becoming the method of choice for label free quantification. Whether using empirical or in silico spectral libraries, performance is dependent on the library; however, the optimal library construction strategy for low input proteomics remains an open question. To address this, we examine and develop library construction approaches that are compatible with both spectrum-centric and peptide-centric analysis workflows. These approaches leverage a closely related, high-quality sample to improve library quality. First, we validated our approach in bulk sample amounts where we observed that the effects of gas-phase fractionation based library construction is dependent on the software framework, with improvements more pronounced in OpenSWATH compared to DIA-NN. In OpenSWATH, our peptide-centric library reconstruction workflow consistently outperforms a transfer learning strategy, an emerging alternative approach. In DIA-NN, trends are dependent on library source highlighting OpenSWATHs stronger dependence on the search space. In low-input applications, such as single-cell-equivalent injection amounts (100 pg) of HeLa cell digest on a timsTOF SCP, our library construction approach provided more pronounced improvements across both software tools compared to bulk samples. Using a peptide-centric reconstruction approach with the OpenSWATH analysis framework, we detected over 15,000 peptide precursors (2480 protein groups), a 90% improvement over the original library. Furthermore, using a spectrum-centric construction approach, peptide precursor identification rates improved over 6-fold ([~]1000 to [~]6000). Our strategy provides a practical solution for generating high-quality libraries in low-input applications.

6
Development of a Xylene-Free Sample Preparation Protocol for Quantitative Proteomics of Clinically Relevant Formaldehyde-Fixed Paraffin-Embedded Needle Biopsy Samples

Moagi, M.; Beke, L.; Mehes, G.; Kecskemeti, G.; Szabo, Z.; Turiak, L.; Csosz, E.

2026-05-14 molecular biology 10.64898/2026.05.12.724492 medRxiv
Top 0.2%
3.7%
Show abstract

Fresh-frozen tissues are considered the gold standard for proteomic analyses due to superior preservation of protein integrity; however, their use is limited by the logistical and financial requirements of long-term storage. Formaldehyde-fixed paraffin-embedded (FFPE) tissues provide a practical alternative owing to their stability and widespread availability in clinical settings. A critical step in FFPE proteomics is deparaffinization, which traditionally relies on organic solvents such as xylene, along with efficient reversal of formaldehyde-induced crosslinks. In this study, we evaluated multiple FFPE protein extraction and digestion workflows including chaotropic, surfactant-based, and detergent-free approaches in combination with xylene-free deparaffinization strategies, using label-free data-independent acquisition (DIA) LC-MS/MS. Among the tested methods, a chaotropic-, reductant-, and surfactant-free in-solution digestion workflow demonstrated robust protein and peptide recovery. A modified version of this protocol further improved peptide coverage while maintaining comparable protein depth. The applicability of the optimized workflow was assessed using FFPE needle biopsy samples from control, hepatic steatosis, and liver fibrosis groups. Distinct proteomic patterns were observed across conditions, with hepatic steatosis associated with early activation of stress-response pathways, while fibrosis showed evidence suggesting altered lipid metabolism. Overall, this study presents a simple, xylene-free, and MS-compatible workflow for FFPE proteomics that is suitable for low-input clinical samples and may support broader application of archival tissues in proteomic research.

7
Predicting and Elucidating Peptide Retention Mechanisms with Graph Attention Networks

Kensert, A.; Hruzova, K.; Devreese, R.; Nameni, A.; Declercq, A.; Gabriels, R.; Martens, L.; Bouwmeester, R.; Urban, J.

2026-05-20 bioinformatics 10.64898/2026.05.18.725893 medRxiv
Top 0.2%
3.6%
Show abstract

Liquid chromatography (LC) is a key technology in bottom-up proteomics, separating proteolytic peptides to decrease sample complexity, enhance coverage, and increase the robustness of protein identification and quantification. Although high-resolution mass spectrometry has advanced significantly, comparable progress in LC has lagged, primarily due to a limited understanding of peptide-column interactions. To bridge this knowledge gap, we introduce a novel deep learning model (PeptideGNN) based on a Graph Neural Network (GNN) architecture to model and elucidate peptide behaviors across various separation conditions. Trained to accurately predict peptide retention times on ten diverse proteomic datasets, the model subsequently employed a saliency mapping technique to interpret the underlying retention mechanisms. Our model consistently outperformed existing retention-time predictors across multiple datasets, while the saliency mapping, importantly, revealed insights into peptide-stationary phase interactions, highlighting the effects of neighboring amino acids, post-translational modifications (PTMs), chromato-graphic columns, and mobile phase additives on peptide retention.

8
ProCAST: A Bioinformatics Suite for Mass Spectrometry-Based Protein Corona Proteomics Analysis

Mun, H.; Leamy, M.; Kaushik, A.; Kieslich, C.; Douglas-Green, S. A.

2026-05-12 bioinformatics 10.64898/2026.05.08.723620 medRxiv
Top 0.2%
3.6%
Show abstract

When nanoparticles are exposed to biological fluids, they spontaneously adsorb proteins, forming a protein corona that defines their biological identity and dictates cellular uptake, biodistribution, and toxicity. Characterizing protein coronas includes using proteomics approaches (e.g., LC-MS/MS) to identify proteins and generate vast lists of adsorbed proteins, often visualized via complex heatmaps. While heatmaps display data they do not offer heuristic guide, leaving the driving mechanisms of adsorption unknown. Moreover, interpretation of protein corona proteomics data remains limited by fragmented workflows, inconsistent preprocessing, and visual outputs that are often descriptive rather than readily interpretable. These conventional methods identify adsorbed proteins but fail to explain why specific proteins are selected or how they influence the particles biological fate. Here, we developed ProCAST (Protein Corona Analysis and Statistical Tool), an R-based framework for protein corona proteomics that integrates proteomics data, nanoparticle metadata, protein annotations, and multi-level visualization within a single analytical workflow. ProCAST facilitates abundant protein clustering based on sample conditions, sequence descriptors, property or protein correlations, and gene ontology-based functional visualization. It also distinguishes abundant proteins from frequent proteins, providing distinct layers of information from the same dataset. ProCAST was used to re-analyze previously published PAMAM G4 dendrimer-FBS datasets, demonstrating that ProCAST reproduces descriptor-level visualizations and offers new insights through clearer comparisons of functional patterns and hypothesis generation from dominant corona proteins. By organizing results as complementary views of the same dataset, ProCAST facilitates the shift of protein corona analysis from descriptive outputs toward structured, comparative, and experimentally testable interpretations.

9
Capillary-based Subcellular Sampling Uncovers the Stress Granule Proteome in Single Cells

Davison, C.; Locker, N.; Marques, M.; Kelly, S.; Relton, E.; Sharma, T.; Fraser, E.; Aragon Fernandez, P.; Schoof, E. M.; Petersen, M.; Pascoe, J.; Lilley, K. S.; Pinto, S. M.; Spick, M.; Bailey, M.

2026-05-13 cell biology 10.64898/2026.05.11.724230 medRxiv
Top 0.2%
3.6%
Show abstract

Many diseases arise from dysfunction within specific organelles or biomolecular condensates, highlighting the value of analysing proteins at subcellular resolution to uncover new biological mechanisms. We report a novel capillary-based subcellular sampling workflow coupled with liquid chromatography-mass spectrometry (LC-MS) for proteomic analysis of defined subcellular regions of individual cells. We applied this methodology to stress granules (SGs), membrane-less biomolecular condensates that form in response to cellular stress (including viral infection), and are implicated in infection, neuropathology and cancer. Comprehensive characterisation of SG protein composition remains limited by technical challenges associated with bulk purification, including loss of spatial context, dynamic behaviour and contamination from cytosolic material. Using our novel method, we identified a high-confidence set of 405 SG-associated proteins, including 46 established SG residents alongside numerous previously unreported candidates. Functional enrichment analysis revealed pathways consistent with known SG biology, while comparison with an independent cytosolic proteome dataset demonstrated minimal overlap, supporting the specificity of the sampling strategy. Selected novel SG protein candidates (AHNAK2, DDX39B, NUDT1 and FKBP2) were validated using immunofluorescence microscopy. These findings establish capillary-based subcellular sampling as a viable approach for proteomic analysis of SGs with preserved spatial context and provide a framework for analysing other subcellular compartments. Table of contentsWe report an LC-MS-based capillary sampling workflow for proteomic analysis of subcellular structures within single cells. This methodology identified 405 high-confidence stress granule-associated proteins, including 46 previously established and numerous novel candidates. The approach demonstrated high specificity and preserved spatial context, expanding the capabilities of subcellular proteomics. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=55 SRC="FIGDIR/small/724230v1_ufig1.gif" ALT="Figure 1"> View larger version (21K): org.highwire.dtl.DTLVardef@1fa0bb0org.highwire.dtl.DTLVardef@1158524org.highwire.dtl.DTLVardef@1d82812org.highwire.dtl.DTLVardef@2ee4d9_HPS_FORMAT_FIGEXP M_FIG C_FIG Figure made in Biorender.com.

10
Stoichiometry-dependent specificity in biotin enrichment: a benchmarking framework for proximity labeling proteomics

Zala, C. A.; Trueba Sanchez, M. C.; van den Bor, J.; Willemsens, T.; Verweij, F. J.; Altelaar, M.; Stecker, K.

2026-05-11 molecular biology 10.64898/2026.05.07.723439 medRxiv
Top 0.2%
2.7%
Show abstract

Proximity labeling methods (including, BioID, TurboID, ultraID), along with surface proteomics and microdomain mapping, enable proteome-wide identification of spatially proximal proteins via MS-based analysis. These workflows require specific enrichment of biotinylated proteins using affinity purification, yet enrichment specificity can often be compromised by non-specifically bound proteins. As labeling strategies are increasingly applied to complex biological samples with low protein input or low biotin stoichiometry, accurately distinguishing true targets from background becomes a major analytical challenge. Despite its critical impact on data quality and interpretation, the influence of biotinylation level and protein input on enrichment performance remains poorly characterized, limiting the reliability of proximity labeling experiments. To address this, we establish a quantitative benchmarking framework that systematically evaluates biotin enrichment under controlled conditions, including scenarios of low biotin stoichiometry. Using this setup, we show that enrichment specificity strongly depends on biotin stoichiometry: higher levels of biotinylation in samples yield high specificity, whereas low biotinylation increases non-specific background. Reduced protein input further limits recovery of true targets, yet maintains enrichment specificity, highlighting sensitivity constraints of enrichment-based workflows. We apply this framework to biotinylated extracellular vesicle (EV) cargo uptake in recipient cells using ultraID-CD63 labeling. Detection of the most abundant EV cargo proteins under low biotinylation conditions indicates that current workflows approach the lower bounds of biotin enrichment sensitivity. Together, these standards provide a practical reference for evaluating and optimizing biotin enrichment workflows, supporting quantitative and reproducible proximity labeling in proteomics.

11
From Peaks to Power: Systematic Evaluation of Chromatographic Sampling Reveals Determinants of Quantification and Biological Discovery in DIA Proteomics

Cantrell, L. S.; Just, S.; Stukalov, A.; Farokhzad, O. C.; Batzoglou, S.

2026-05-16 bioinformatics 10.64898/2026.05.13.724964 medRxiv
Top 0.2%
2.7%
Show abstract

Modern DIA proteomics increasingly emphasizes throughput and depth for large-cohort studies, but methods are often optimized using proxy metrics that can mask losses in quantifiable signal and statistical power. Here, we evaluate how datapoints per peak and other chromatographic features jointly contribute to quantification and downstream biological discovery. Using a matrix-matched calibration curve dataset, we checked how the number of datapoints per peak (DPPP) affects the limits of detection and quantification (LOD/LOQ). Reduced DPPP minimally affected LOD but substantially degraded LOQ. Feature modeling and nonparametric association analyses identified precursor peak area as the strongest feature-level predictor of LOQ, whereas DPPP showed weaker and context-dependent effects. Simulations of chromatographic peak integration recapitulated these trends, showing that increased sampling primarily improves integration precision, while quantitative accuracy is strongly governed by peak height and peak shape. Finally, when comparing 20 cancer vs 20 control plasma samples processed with Seer Proteograph, the decrease in DPPP led to a loss of statistical significance for proteins with low-abundance precursors. These findings argue that DIA optimization should prioritize LOQ and statistical power metrics - not identifications alone - by balancing sampling density with chromatographic peak height and quality to maximize useful biological signal.

12
A framework for peptide identification on commercial nanopore sequencing platforms

Beslic, D.; Kucklick, M.; Graap, E.; Sedaghatjoo, S.; Renard, B. Y.; Fuchs, S.; Engelmann, S.; Koerber, N.

2026-05-21 bioinformatics 10.64898/2026.05.19.726067 medRxiv
Top 0.2%
2.6%
Show abstract

Direct single-molecule peptide analysis could in principle enable rapid and sensitive identification of pathogen-derived or disease-associated biomarkers without reliance on mass spectrometry. However, existing nanopore peptide sensing methods are typically constrained by limited throughput and lack of accessibility beyond specialized setups. Here, we present an integrated experimental-computational framework for DNA-linked peptide translocation on a commercially available, high-throughput nanopore sequencing platform, the MinION. Synthetic peptides were covalently bound to oligonucleotides at both termini. The resulting peptide-DNA constructs were then translocated through the CsgG-CsgF pores using a DNA motor protein. Current traces were segmented using the known DNA sequences to extract peptide-associated signal regions. From these segments, we extracted signal features and trained feature-based and deep-learning classifiers to distinguish peptides, balancing interpretability and classification performance. We establish a framework for peptide identification using standard nanopore sequencing hardware. Across a diverse panel of synthetic peptides, our approach resolves single-amino-acid substitutions, maintains performance across independent sequencing runs, and correctly identifies peptides in blind mixtures. Interpretable model analyses connect classifier decisions and common errors to specific signal motifs. By combining commercially available instrumentation with a reproducible experimental and computational workflow, this framework lowers the barrier to nanopore-based proteomics and enables broader adoption across laboratories. It provides a foundation for future developments in amino acid modification detection and sequence analysis.

13
Extending structural surfaceomics to identify aberrant conformations of tumor surface proteins as potential immunotherapy targets

Kishishita, A.; Cismoski, S.; Grant, T.; Deo, R.; Prudhvi, S.; Sue, C.; Barpanda, A.; Yu, C.; Shenoy, S.; Berman, S.; Reeves, A. G.; Li, H.; Liu, T.; Naik, A.; Biswas, D.; Jiao, F.; He, Y.; Hancock, M.; Dalal, R.; Zalevsky, A.; Hoopmann, M. R.; Ye, C. J.; Viner, R. I.; Feng, F.; Mandal, K.; Moritz, R. L.; Echeverria Riesco, I.; Sali, A.; Wells, J. A.; Srivastava, S.; Huang, L.; Wiita, A. P.

2026-05-18 cancer biology 10.64898/2026.05.15.721813 medRxiv
Top 0.3%
2.3%
Show abstract

The complement of tumor cell surface proteins, or "surfaceome", is a rich source of potential immunotherapy targets. To move beyond expression-based target discovery, we previously described "structural surfaceomics," combining crosslinking mass spectrometry (XL-MS) with surface protein biotinylation to identify conformation-selective targets. In our prior work, we applied this method to a single model of acute myeloid leukemia (AML), identifying active integrin beta-2 as a promising target. Here, we expand structural surfaceomics to identify additional immunotherapy targets and surface protein biology across additional models of AML, multiple myeloma, and prostate cancer, as well as donor peripheral blood mononuclear cells. Utilizing these models and different chemical crosslinkers, we compile an extensive database of 5,209 crosslinks. We characterize both shared and unique crosslink-based features, identifying 1,612 disease model-specific crosslinks, including 212 potentially defining tumor-specific conformations based on distance constraint violations relative to AlphaFold predictions. We further implement a suite of emerging modeling tools to predict tumor-specific protein structures. We probe crosslinking patterns suggesting multiple myeloma-specific CD48 and AML-specific integrin 1/{beta}4 heterodimer conformations. This work establishes a resource for cancer structural biology by implementation of structural surfaceomics. Our findings also point toward more realistic protein design models, potentially enabling systematic detection of targetable cancer-specific epitopes for next-generation immunotherapies.

14
Manchester Proteome Profiler: A User-Friendly Platform for Quantitative Proteomic Analysis

Cain, S. A.; Fatima, M.; Humphries, M.

2026-05-18 bioinformatics 10.64898/2026.05.14.725092 medRxiv
Top 0.3%
2.2%
Show abstract

Manchester Proteome Profiler (MPP) is an open-source R Shiny application that streamlines downstream analysis of quantitative proteomic data. Compatible with grouped protein intensities tables from MaxQuant, FragPipe, Proteome Discoverer and other custom layouts, MPP provides an integrated platform for filtering, normalisation, imputation, differential expression analysis and cluster analysis across user-chosen experimental conditions. MPP supports both single- and dual-dataset comparisons, incorporates SAINTexpress for affinity purification and proximity labelling experiments, and downstream analysis of the significant protein list clusters to functional enrichment and interaction networks via Gene Ontology, BioGRID and STRING. Benchmarking with a KRAS proximity biotinylation dataset demonstrated the ability of MPP to identify reproducible clusters of differentially expressed proteins and reveal biologically meaningful patterns, including enrichment of solute carrier transporters and adhesion molecules. With interactive visualisations, customisable reports, and support for complex experimental designs, MPP offers a novel, versatile and user-friendly environment for proteomic data exploration and hypothesis generation.

15
A unified framework for batch correction and missing data handling in large-scale and single-cell mass spectrometry proteomics

Anwar, A. M.; Bayoumi, S.; Lahti, L.; Coffey, E.

2026-05-21 bioinformatics 10.64898/2026.05.19.726178 medRxiv
Top 0.3%
2.1%
Show abstract

Large-scale mass spectrometry (MS)-based proteomics, including single-cell proteomics, is routinely affected by technical variation arising from discrete batch effects, inter-laboratory differences and continuous signal drift during data acquisition. Current correction strategies typically address these sources of unwanted variation independently and often require either removal of proteins with missing values or imputation before correction, both of which may lead to information loss and potential amplification of technical bias. Here we present NMFBatch, a unified statistical framework that simultaneously models discrete and continuous unwanted variation in bulk and single-cell proteomics data. NMFBatch integrates non-negative matrix factorization with generalized additive modelling and directly accommodates missing values, thereby enabling both on-the-fly imputation during correction and optional post-correction imputation. Benchmarking against six batch-correction methods using multi-laboratory reference datasets and a large plasma proteomics cohort, shows that NMFBatch consistently reduces batch-associated variation while preserving biological structure under both balanced and confounded experimental designs. Application to single-cell proteomics data further showed effective reduction of TMT- and acquisition-associated variation while retaining biologically meaningful clustering. Together, these results establish NMFBatch as a flexible framework for modelling unwanted variation in proteomics experiments, with potential applications in cross-cohort harmonization and integrative proteomics analysis. Graphical AbstractCreated in BioRender. Youssef, A. (2026) https://BioRender.com/c1q1yxt O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=181 SRC="FIGDIR/small/726178v2_ufig1.gif" ALT="Figure 1"> View larger version (45K): org.highwire.dtl.DTLVardef@2b7cd1org.highwire.dtl.DTLVardef@10fada3org.highwire.dtl.DTLVardef@50e66corg.highwire.dtl.DTLVardef@147f81c_HPS_FORMAT_FIGEXP M_FIG C_FIG

16
Extraction-dependent bone proteomics reveals distinct stable and dynamic protein modules during early post-exposure degradation

Najar, M. A.; Choudhary, N.; Abdulsalam, S.; Sajeevan, A.; Ahmad, M. N.

2026-05-04 systems biology 10.64898/2026.04.29.721604 medRxiv
Top 0.3%
2.1%
Show abstract

Bone is a highly durable biological tissue widely used in forensic, archaeological, and anthropological investigations; however, efficient protein recovery and understanding of protein stability over time remain major challenges in skeletal proteomics. Here, we systematically evaluated three bone protein extraction workflows and integrated them with data-independent acquisition (DIA) mass spectrometry to assess proteome coverage, reproducibility, and temporal protein dynamics under environmentally exposed conditions. Comparative analysis demonstrated that extraction strategy is a primary determinant of detectable proteome composition. EDTA-based demineralization followed by SDS extraction provided the deepest proteome coverage and highest reproducibility, whereas guanidine hydrochloride extraction preferentially enriched collagen and extracellular matrix proteins. In contrast, acid-based extraction yielded limited protein recovery. Temporal profiling of bone samples collected at 10 and 45 days post-exposure revealed two distinct protein classes. A temporally stable module, enriched in collagens and extracellular matrix proteins including COL1A2, COL5A2, BGN, SPARCL1, and NID2, exhibited minimal abundance change, indicating resistance to environmental degradation. In contrast, temporally dynamic proteins, enriched in mitochondrial, metabolic, and intracellular pathways such as ACO2, OGDH, PDHA1, ATP5PO, and PFKM, showed marked decline over time. These findings support a two-compartment model of bone protein preservation in which matrix-embedded proteins are preferentially retained while exposed intracellular proteins undergo progressive degradation. Collectively, this study establishes an integrated framework linking extraction methodology with temporal proteome stability and identifies candidate markers for skeletal preservation assessment and temporal biomarker development in forensic and archaeological applications.

17
Simultaneous single-cell profiling of the transcriptome and proteome

Xu, X.; Caggiano, M. P.; Wells, M. L.; Sun, G.; Lim, S. M.; Multari, D. H.; Blundell, S. A.; Hartel, N.; Viner, R.; Polo, J. M.; Schittenhelm, R.; de Marco, A.

2026-05-15 systems biology 10.64898/2026.05.14.724921 medRxiv
Top 0.3%
1.9%
Show abstract

Transcriptomic and proteomic measurements from the same single cell provide complementary information that cannot be inferred from either modality alone, yet methods for the parallel recovery of both analyte classes from a single-cell lysate remain limited. Here, we describe a workflow in which individual cells are isolated by automated dispensing into a minimal, MS-compatible lysis volume, followed by sequential mRNA capture and protein supernatant recovery, prior to independent downstream processing. The method is compatible with standard library preparation and data-independent acquisition proteomics pipelines and requires no dedicated instrumentation beyond a single-cell dispensing platform. We evaluated workflow performance on 67 single cells across 3 iBlastoids. Transcriptomic sequencing detected a median of 5375 genes per cell, and proteomic analysis identified a median of 2123 protein groups per cell across two mass spectrometry platforms. Compared with a standalone single-cell proteomics protocol, incorporating the mRNA extraction step reduced median proteomic depth by approximately 11% (median 1,965 vs. 2,204 protein groups per cell), while mean percell identification remained comparable across workflows (1,790 vs. 1,775 protein groups per cell). Direct comparison of paired transcript and protein abundance yielded a median Spearman correlation of {rho} {approx} 0.38; after correction for detection depth, the partial correlation was 0.067.

18
Learning from Drops: AI-Guided Integration of Liquid Biopsy Features in Cancer Studies

Andueza, M.; Villoslada-Blanco, P.; De Dreuille, B.; Alonso, L.; Sabroso-Lasa, S.; Pantel, K.; Alix-Panabieres, C.; Lopez de Maturana, E.; Malats, N.

2026-05-17 bioinformatics 10.64898/2026.05.12.724535 medRxiv
Top 0.4%
1.7%
Show abstract

Cancer is a major global health issue with rising incidence and mortality. Early detection, tumor characterization, and disease surveillance are crucial for timely and effective treatment, ultimately reducing mortality rates. Liquid biopsy (LB) has emerged as a valuable detection tool offering a non-invasive method to determine tumor-derived biomarkers in body fluids with demonstrated translational potential. To increase biomarker sensitivity, high-throughput sequencing platforms deliver massive volumes of data. Artificial Intelligence (AI) is pivotal in enabling huge and complex data integration. This contribution aims to assess the current state of integrative AI-based research in the LB field and provide methodological guidance. First, we conducted a PubMed search and found that the literature is sparse in studies integrating LB features, particularly by applying AI. When adopting the latter approach, defining the study objectives is crucial to guide the subsequent methodological aspects, including study design, patient selection criteria, sample size, nature of the LB features, and metadata to collect. Specifically, we propose strategies and tools for data preprocessing, including normalization and batch correction, as well as handling outliers and missing data. Furthermore, we recommend various Machine/Deep Learning approaches for feature selection techniques to ensure model robustness, and we highlight the importance of undergoing rigorous internal and external validations of the selected models. Assessing clinical utility and interpretability is often overlooked but fundamental for real-world implementation. In conclusion, we provide the LB scientific community with an AI-based methodological guidance to bridge the two fields and enhance the integrative analysis of LB features. Graphical abstractWorkchart for multiomics integrative studies in the liquid biopsy field. Note: CTCs, circulating tumor cells; ctDNA, circulating tumor-DNA; TEPs, tumor-educated platelets; miRNA, microRNA; cfRNAs, cell-free RNAs. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=159 SRC="FIGDIR/small/724535v1_ufig1.gif" ALT="Figure 1"> View larger version (45K): org.highwire.dtl.DTLVardef@1f250b2org.highwire.dtl.DTLVardef@18fe36corg.highwire.dtl.DTLVardef@19c02b9org.highwire.dtl.DTLVardef@176f6e0_HPS_FORMAT_FIGEXP M_FIG C_FIG

19
Genome-wide protein-protein interaction analysis between aquaporins and harpin (HrpZ2) through molecular docking and MD simulations to unravel its role in growth and stress management in tomato.

Lal, K.; Sinha, T.; Anand, S.; Kumar, G.; Mishra, A.; Dey, D.

2026-05-07 plant biology 10.64898/2026.05.04.722745 medRxiv
Top 0.4%
1.5%
Show abstract

HrpZ2, a harpin protein produced by Pseudomonas syringae, a gram-negative plant pathogenic bacterium, elicits hypersensitive response and pathogen defense in non-host plants. Harpins from various bacterial sources elicit varying responses in different non-host plants, due to its structural variations, their precise mechanisms of action are not yet completely understood. As per previous reports, harpins from diverse bacterial sources interact with distinctive members of integral membrane proteins, known as aquaporins. For example, harpin (Hpa1Xoo) interacts with OsPIP1;3 in rice, whereas, in Arabidopsis the harpins Hpa1 and HrpZ interacts with AtPIP1;4 and AtPIP1;3 respectively. Here, we conducted the first genome-wide computational screening of protein-protein interactions between HrpZ2 and all 47 members of tomato aquaporins. Molecular docking identified nine interactors across five subfamilies of aquaporins, with HrpZ2 N-terminal residues mediating these interactions. We validated these via molecular dynamics (MD) simulations, principal component analysis, and free energy landscape analysis, assessing the stability (RMSD, RMSF, radius of gyration), dynamics, and affinity (MM-GMSA). PIP complexes, especially PIP2;1 (-460.46 kcal/mol) and PIP1;7 (-303.82 kcal/mol), exhibited superior stability, compactness, and defined energy minima, confirming PIPs as primary sensors of harpins. Non-PIP aquaporins like TIP1;1 and NIP4;1 showed moderate stability, outperforming weaker interactors (SIP2;1, XIP1;5, XIP1;3). These findings provide robust evidence that HrpZ2 preferentially targets PIPs in tomato, while engaging TIPs and NIPs as auxiliary partners. This multifaceted interaction profile of harpins suggests complex plant-pathogen recognition, modulating aquaporin-mediated cellular responses like growth and stress management in plants.

20
Breast cancer is linked to changes in the urinary extracellular vesicle proteome

Laziri, N.; Zainurin, N. A. A.; Bambarandhage, A. U. K. H.; Fatudimu, O. S.; Gate, T.; Tench, H.; Fu, D.; Zhang, X.; Beckmann, M.; Phillips, H.; Pennick, M.; Morphew, R. M.; Mur, L. A.

2026-05-12 genetic and genomic medicine 10.64898/2026.05.08.26352674 medRxiv
Top 0.4%
1.4%
Show abstract

Breast cancer (BC) remains a leading cause of morbidity and mortality worldwide. Early detection remains the most effective strategy for improving prognosis. We explored the urinary extracellular vesicle (uEV) proteome for changes linked to BC which could also be potential biomarkers. Urine samples were collected from 20 participants across four groups (n = 5 each): newly diagnosed BC patients, benign breast disease (BBD) patients, individuals with breast cancer symptoms (symptom control, SC), and age-matched healthy controls (HC). EVs were isolated using size exclusion chromatography and extracted proteins were analysed using a GeLC proteomic approach. Proteins were identified and quantified using Proteome Discoverer and further analysed using MetaboAnalystR, Funrich and Metascape. A total of 256 proteins were identified from the uEV preparations. BC comparisons with BBD, SC and HC identified 7 proteins differentially expressed proteins (DEP); SERPINB1 -- Serpin family B member 1, LCN1 -- Lipocalin 1, SIRPA -- Signal regulatory protein alpha, ACTB -- Actin, beta, YWHAZ --Tryptophan 5-monooxygenase activation protein zeta, Ig JCHAIN and APOA1 -- Apolipoprotein A1. Receiver Operator Characteristic (ROC) curve assessments suggested that each DEP protein had an area under the curve (AUC) of > 0.8. These findings highlight EV-derived proteins as promising non-invasive biomarkers for breast cancer detection, warranting further validation in larger cohorts.