PROTEOMICS — Latest Matching Preprints

1

Community Resource: A Genome-Based Extension of Large-Scale Wheat Proteogenomics

Vincent, D.; Appels, R.

2026-07-08 plant biology 10.64898/2026.06.17.733048 medRxiv

Top 0.1%

19.1%

Show abstract

Bread wheat (Triticum aestivum L.) possesses a large and highly repetitive allohexaploid genome and annotation requires extensive protein-level validation. We developed a genome-based wheat proteogenomics workflow integrating large-scale MS/MS reanalysis, GFF3-based peptide coordinate reconstruction, thorough validation, and genome browser-compatible peptide deployment against the IWGSC RefSeq v2.1 reference genome. Public wheat proteomics datasets comprising 577 raw mass spectrometry files ([~]1.0 TB) from 32 tissues were reprocessed using FragPipe/MSFragger, generating 2,226,779 non-redundant peptides and 1,648,740 unique protein accessions. Peptide-to-genome projections using GFF3 annotation files produced 8,291,056 genomic peptide projected rows, of which 98.14% passed validation procedures. Overall, peptide evidence supported 103,095 high-confidence (HC) and 135,495 low-confidence (LC) wheat gene models, corresponding to 96.4% and 84.7% of all parsed HC and LC annotations, respectively. In total, 238,590 wheat gene models (89.4% of all parsed annotations) received protein-level support. Apollo/JBrowse-compatible BED tracks enabled exon-resolved visualisation of peptide evidence across wheat chromosomes. Together, this study establishes a scalable GFF3-based proteogenomics framework for complex polyploid plant genomes and provides an extensive community resource for wheat genome annotation refinement and visual exploration (https://bread-wheat-um.genome.edu.au/apollo/49826/jbrowse/index.html). Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=63 SRC="FIGDIR/small/733048v2_ufig1.gif" ALT="Figure 1"> View larger version (16K): org.highwire.dtl.DTLVardef@6e797org.highwire.dtl.DTLVardef@14ea4fdorg.highwire.dtl.DTLVardef@31f027org.highwire.dtl.DTLVardef@8d908a_HPS_FORMAT_FIGEXP M_FIG C_FIG

2

Hidden Structural Bias in Proteomics: Sonication-induced Selective Fragmentation of Intrinsically Disordered Regions

Narita, M.; Yamakawa, T.; Nishimura, R.; Iwasaki, M.

2026-07-15 cell biology 10.64898/2026.07.14.738389 medRxiv

Top 0.1%

12.6%

Show abstract

Sonication is a fundamental technique in proteome sample preparation, primarily used for protein solubilization and shearing of genomic DNA. Although the mechanical shearing of DNA is well-characterized, its unintended impact on protein structural integrity remains a significant "blind spot" in high-throughput analytical workflows. In this study, we systematically investigated sonication-induced protein fragmentation by combining gel-based fractionation (PEPPI-MS) with sequence-level compositional analysis and bioinformatic mapping. Our results demonstrate that sonication does not significantly alter overall proteome identification or the recovery of membrane proteins; however, it induces extensive and non-random protein fragmentation. Sonication caused an approximately three-fold increase in the abundance of >45 kDa protein-derived fragments migrating into the <40 kDa fraction, and 1,620 high-molecular-weight (MW) proteins were uniquely detected in the lower-MW fraction upon sonication, an eight-fold increase over non-sonicated controls. Peptide-level amino acid composition analysis revealed subtle but directional shifts in the sonication-derived fragments. This residue-level signature is reinforced by two orthogonal structural analyses (MobiDB peptide-level mapping and protein-level profiling using metapredict V3 software), which show that sonication-susceptible proteins harbor more than twice the disordered content of length-matched controls (median 40% vs. 18%). This study identifies a previously unrecognized "structural bias" whereby intrinsically disordered region (IDR)-rich proteins are selectively compromised during sample preparation. Because these fragments are indistinguishable from enzymatic digestion products in conventional bottom-up proteomics, the underlying structural damage is effectively masked in global quantitative datasets, potentially distorting biological interpretations related to protein size, isoforms, and stability, particularly for IDR-rich classes, such as transcription factors and signaling molecules. We propose that optimizing and standardizing sonication parameters is essential for ensuring the accuracy and reproducibility of quantitative proteomic analyses.

3

Enhanced proteome relative quantification using refined quantotypic spectral libraries

Barnes, B. A.; Alharbi, H.; Unwin, R.

2026-07-10 bioinformatics 10.64898/2026.07.06.736793 medRxiv

Top 0.1%

7.9%

Show abstract

Plasma proteomics is used for a variety of applications including biomarker discovery, disease monitoring, and drug development. Data-independent acquisition (DIA) has vastly improved the breadth of proteins that are identified from samples; however, given challenges in reproducibility and translation, it is critical that the quantitative performance of these methods is reliable. Analysis of global proteomics data typically incorporates information from all detected peptides. However, some peptides do not reflect their parent protein amount, due to irreproducible digestion, modification, analytical interferences or instability. We hypothesise that including these peptides impacts protein relative quantification, and thus, a refined spectral library containing only quantitatively representative peptides provides superior protein quantification. By analysing a defined multi-species spike-in model, we show that refining a plasma spectral library by removing precursors that fail to meet quality control metrics (25.4% of all identified precursors) reduces noise and variability, improving precision, accuracy and differential abundance analysis by up to [~]11%, with minimal identification losses and substantial reduction in computational demand. This demonstrates proof-of-concept that refining spectral libraries produces results that prioritize quantification quality over quantity. This approach could enable development of universal tissue-specific refined spectral libraries able to improve quantification quality with easy implementation and minimal processing time. Significance of the StudyAs DIA mass spectrometry proteome depth increases, the quality of the associated protein quantifications must be considered alongside identification breadth, particularly in complex matrices such as plasma, which presents additional technical challenges. The spectral library used for protein identification and quantification is a critical determinant of DIA performance, and its composition requires considerable consideration. This work illustrates an initial step toward improving protein quantification starting at the spectral library level by filtering precursors which are poor quantitative representatives of their parent proteins. In doing so, the resulting data is more reliable for downstream and biological interpretation, with fewer false differential abundance assignments and reduced quantitative noise. As such, this work represents a broader shift away from the habitual focus of MS workflows on maximising the number of protein and differential abundance identifications and instead prioritises the quality of quantification over quantity. These initial findings lay the groundwork for further development of spectral library refinement strategies, with the potential to continue improving the accuracy and precision of protein quantification in DIA-based proteomics.

4

Development of an Ethylenediaminetetraacetic Acid-Enhanced Deep Proteomic Profiling Method for Dried Blood Spots and Its Application in Mouse Disease Models

Nakajima, D.; Kanno, T.; Okuda, Y.; Mitsui, H.; Konno, R.; Ueyama, N.; Endo, Y.; Ohara, O.; Kawashima, Y.

2026-07-14 molecular biology 10.64898/2026.07.13.738354 medRxiv

Top 0.1%

5.5%

Show abstract

Dried blood spots (DBS) are well-established microsamples used in clinical testing and newborn screening. However, their use in deep proteomics is hindered by highly abundant blood proteins and inefficient protein recovery from filter paper matrices. The non-targeted analysis of non-specifically DBS-absorbed proteins (NANDA) workflow partially overcomes the impact of abundant blood proteins and has enabled the identification of over 5,000 proteins from DBS samples. Nonetheless, residual abundant proteins, including hemoglobin and fibrinogen, constrain deep proteomic analysis. Therefore, this study aimed to evaluate the effects of the metal chelator ethylenediaminetetraacetic acid (EDTA) on the depth of DBS proteomic analysis. An optimized EDTA-enhanced NANDA protocol that incorporated a 100 mM EDTA wash step was compatible with standard DBS collection procedures and required no modification of current clinical workflows, markedly enhancing the depletion of abundant proteins and facilitating its potential use in clinical and translational settings. When combined with Orbitrap Astral data-independent acquisition mass spectrometry, this approach enabled the single-shot identification of more than 7,000 proteins from DBS samples; to the best of our knowledge, this represents the deepest proteome coverage reported to date, and the workflow further supported high-throughput and highly reproducible analyses. Additionally, its application to mouse disease models revealed disease-specific systemic immune signatures from minimal blood volumes. Collectively, these results establish EDTA-enhanced NANDA as a practical and scalable workflow that overcomes longstanding limitations of DBS proteomics, thereby enabling deep, high-throughput, minimally invasive proteomic profiling across diverse biological and experimental contexts.

5

Sortase-mediated enrichment of ubiquitinated proteins from complex samples

Raniszewski, N.; Beckley, K.; Hintzen, J.; Noel, M.; Burslem, G.

2026-07-01 biochemistry 10.64898/2026.06.29.735432 medRxiv

Top 0.1%

5.0%

Show abstract

Despite its importance in cellular signaling and protein fate, the detection of protein ubiquitination in proteomics experiments presents many challenges for researchers. Importantly, current techniques that often rely on antibodies specific for lysine sidechain modifications may miss non-canonical ubiquitination sites in experiments. We envisioned a strategy that uses sortase, a bacterial transpeptidase enzyme, to selectively modify ubiquitination sites with a Biotin tag for enrichment and downstream proteomics experiments. In this work, we demonstrate our ability to selectively modify N-terminal diglycine remnants in digested proteins with a Biotin-modified peptide, enabling downstream enrichment of previously ubiquitinated proteins. We show this proof of concept on several recombinant proteins, revealing a site of autoubiquitination in the E2 conjugating enzyme Ubc13. We show that elution of the enriched peptides can be achieved by using common guanidinium elutions or by leveraging the reversibility of sortase. Finally, we include a bifunctional peptide that is labile to trypsinization to better streamline this strategy for downstream proteomics approaches. We envision that this approach will provide an accessible strategy for the detection of ubiquitinated proteins in proteomics experiments, with the goal of enabling researchers to better detect noncanonical protein ubiquitination.

6

Protein Aggregation Capture for Top-down Proteomics

Feltenstein, I. G.; Drown, B. S.

2026-07-03 biochemistry 10.64898/2026.07.02.736076 medRxiv

Top 0.1%

4.9%

Show abstract

Proteins are dynamically regulated by a myriad of post-translational modifications (PTMs) that control their stability, conformation, activity, subcellular localization, and local interactions. Capturing the precise composition of these various modification states, or proteoforms, is a principal objective of top-down proteomics (TDP). By ionizing intact proteoforms and combining measurements of precursor ion and fragment ion masses, the position, stoichiometry, and combination of PTMs can be determined. Despite the highly valuable measurements that TDP can provide, it is typically less sensitive than corresponding peptide-level analysis with many reports utilizing input material in the microgram to milligram range. Contributing to this lack of sensitivity is the risk of sample loss due to non-specific binding to surfaces during sample preparation. The most widely employed sample preparation approaches for TDP either require high sample input (e.g. precipitation and ultra-filtration) or fail to effectively remove surfactants (e.g. solid-phase extraction). These limitations have hindered advancement of targeted TDP applications involving immunoprecipitation and other enrichment strategies. Bead-assisted protein aggregation, also referred to as single-pot, solid-phase-enhanced sample preparation (SP3), has emerged as a popular sample preparation strategy for bottom-up proteomic workflows, but has only been used in TDP with secondary ion exchange chromatography cleanup. We envisioned a magnetic bead based protein cleanup approach that proceeds directly to MS analysis with judicious choice of bead surface chemistry and elution conditions. Here we report a sample preparation method using hydroxyl-functionalized magnetic beads for top-down proteomics applications.

7

Systematic optimization and benchmarking of synchro-PASEF for high-throughput phosphoproteome profiling

Brademan, D.; Mullarkey, A.; Greeson, M.; Szvetecz, S.; Vitek, O.; Blythe, E.; Huttenhain, R.

2026-06-27 biochemistry 10.64898/2026.06.26.734570 medRxiv

Top 0.1%

4.7%

Show abstract

High-throughput data-independent acquisition (DIA) workflows paired with short chromatographic separations are increasingly adopted for systems biology and clinical proteomics. However, narrower peak widths from rapid separations demand faster mass spectrometer cycle times to maintain quantitative depth and reproducibility. The synchro-PASEF acquisition mode on timsTOF mass spectrometers diagonally scans across ion mobility and m/z space, enabling efficient sampling of the precursor ion cloud with shortened cycle times. While synchro-PASEF has demonstrated competitive identification depth for global protein abundance samples compared to conventional dia-PASEF, its performance for phosphoproteomics - where the precursor ion cloud is characteristically broader and bimodally distributed - has not been evaluated. Here, we systematically optimized synchro-PASEF methods for phosphoproteomics and benchmarked performance against two dia-PASEF methods across three sub-hour separations. We found that synchro-PASEF performance depends critically on balancing diagonal window number, total isolation width, and gradient length, with longer gradients favoring more windows for selectivity and shorter gradients favoring fewer windows to preserve sampling frequency. An optimized configuration quantified over 19,000 localized phosphosites using a 23-minute separation. Retention time summation (RTsum) with a factor of 2 increased phosphopeptide identifications by 5-20% and reduced phosphosite-level coefficients of variation by up to 30% across all dia-PASEF and synchro-PASEF methods tested. Using {beta}2-adrenergic receptor (B2AR) activation as a signaling model, we demonstrate that label-free DIA phosphoproteomics can be used to model phosphoproteomics dose-response relationships, showing that synchro-PASEF and dia-PASEF produce highly concordant phosphoproteomic responses, with comparable numbers of responding phosphosites, similar effect sizes, and nearly identical predicted protein kinase A (PKA) substrates downstream of the activated B2AR. While synchro-PASEF did not surpass optimized dia-PASEF in identification depth, its comparable biological performance and amenability to post-acquisition optimization through RTsum support its utility for high-throughput phosphoproteomics. This work provides a transferable framework for synchro-PASEF method optimization and demonstrates the broad utility of retention time summation for PASEF-based phosphoproteomics workflows.

8

Unveiling Cerebrospinal Fluid Protein Biomarkers in Pediatric Acute Lymphoblastic Leukemia Using Proximity Extension Assay

Moballegh Nasery, M.; Gergely, R.; Kutszegi, N.; Szegedi, I.; Erdelyi, D. J.; Kiss, C.; Csosz, E.

2026-07-03 biochemistry 10.64898/2026.07.03.736065 medRxiv

Top 0.2%

4.3%

Show abstract

Abstract Background: Acute Lymphoblastic Leukemia (ALL) is a highly heterogeneous pediatric malignancy. Despite high survival rates, relapse and the involvement of central nervous system (CNS) remains a significant clinical challenge. Traditional clinical parameters often lack the precision required for early detection and risk stratification. This study utilizes high-throughput proteomics and machine learning to identify molecular signatures in cerebrospinal fluid (CSF) that characterize disease effect and treatment response. Methods: 82 CSF samples from 41 pediatric ALL patients at diagnosis (VD) and remission (VR) were analyzed. Proteomic profiling of 276 proteins was performed using Olink Proximity Extension Assay. Differentially abundant proteins were identified (q-value< 0.05, |Log_2FC| > 0.5) using the Wilcoxon rank-sum test. Three machine-learning algorithms - Random Forest, LASSO, and SVM-RFE - were integrated to select the differentially abundant proteins in VR and VD and between CNS involvement levels. To validate the data Pan-Cancer Atlas analysis was done using two different platforms. Results: In the remission phase, we observed significant alterations in the expression of key proteins compared to diagnosis, with ADGRG1 and KYNU showing a marked increase, while CCL17, CD5, CD27, CXCL9, CXCL11, FASLG, GZMA, and TNFRSF9 were significantly downregulated. Furthermore, our analysis identified distinct protein signatures associated with CNS involvement: CCL4, CTSC, CXCL10, CXCL9, and MMP7 were differentially abundant at the VD stage, whereas CAIX, CASP-8, HAGH, CXCL9, MMP7, MCP-2, and VWC2 at the VR stage. Conclusion: Integrating Olink proteomics with machine learning identified molecular signatures in ALL that have the potential to be further developed to a biomarker panel for monitoring treatment response and guiding personalized therapeutic strategies shifting the focus toward the Precision One Health approaches.

9

onsite: An Integrated Framework for Phosphosite Localization and False Localization Rate Estimation

Yue, Q.-X.; Wei, Z.; Dai, C.; Bai, M.; Perez-Riverol, Y.; Sachsenberg, T.

2026-07-11 bioinformatics 10.64898/2026.07.08.737157 medRxiv

Top 0.2%

4.3%

Show abstract

With the rapid development of mass spectrometry-based proteomics, the volume of phosphoproteomic data has increased substantially. However, accurate localization of phosphorylation sites and standardized statistical validation remain critical analytical bottlenecks. To address the lack of standardized cross-algorithm evaluation, we introduce onsite, a unified and open-source Python framework. onsite integrates an alanine-decoy strategy to estimate the false localization rate (FLR) across three algorithms: AScore, PhosphoRS, and pyLucXor. This modular architecture efficiently processes large-scale datasets and enables global FLR calculation. Benchmarking on the standard synthetic phosphopeptide dataset PXD000138 highlighted distinct inter-algorithmic variations. Using the same 5% global FLR threshold, pyLucXor localized the most target sites (28,353). It also reached a high accuracy (91.22%) against the known ground truth, resulting in the largest number of correctly localized sites (25,865). Reanalysis of the highly fractionated, large-scale PXD012255 dataset further demonstrated that native integration of onsite into the quantms pipeline enables scalable processing and provides a standardized framework for FLR control in large-scale phosphoproteomics. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=64 SRC="FIGDIR/small/737157v1_ufig1.gif" ALT="Figure 1"> View larger version (14K): org.highwire.dtl.DTLVardef@e4c85dorg.highwire.dtl.DTLVardef@1e8464org.highwire.dtl.DTLVardef@185cea1org.highwire.dtl.DTLVardef@1c0d1bc_HPS_FORMAT_FIGEXP M_FIG C_FIG

10

Dynamic Patterns of Nuclear Transcription Factor Abundance in Plant Basal Immunity Revealed by Spatial Proteomics of Arabidopsis Nuclei

Ayash, M.; Proksch, C.; Thieme, D.; Bauer, N.; Lee, J.; Heilmann, I.; Hoehenwarter, W.

2026-07-09 plant biology 10.64898/2026.06.30.735533 medRxiv

Top 0.2%

4.0%

Show abstract

O_LIThe control of amount of nuclear proteins is fundamental in regulating plant gene expression, but the mechanisms of quantitative dynamics of the nuclear proteome are largely unstudied during adaptive responses to pathogens. C_LIO_LIHighly specific labeling, enrichment and measurement of the nuclear proteome was performed using TurboID LC-MS of Arabidopsis thaliana leaves treated with the pathogen-associated molecular pattern (PAMP), flg22, and/or cycloheximide. The chosen experimental approach allowed discrimination of the effects of translation, nuclear protein import, trafficking of preexisting proteins, derepression, and nuclear protein turn-over upon elicitation of basal immunity. C_LIO_LIThe highly specific, deep coverage of proteins in the nucleus makes this study a resource for anyone interested in plant nuclear proteome dynamics and defense. C_LIO_LIAround 2,000 nuclear proteins were repeatedly quantified, including more than 300 transcription factors or other proteins related to transcription. Several proteins with documented activity in endosomes were newly synthesized and imported into nuclei upon PAMP challenge, suggesting alternative nuclear functions in PAMP-triggered immunity (PTI). Circadian clock components, including the transcription factor, CIRCADIAN CLOCK ASSOCIATED 1 (CCA1)-HIKING EXPEDITION (CHE), were depleted upon PAMP challenge, suggesting a safeguard against untimely induction of systemic acquired resistance (SAR). C_LIO_LIBased on proteomic patterns, proteins moonlighting in the nucleus as well as trafficking and turn-over regulation of the proteome are common elements during plant immunity. C_LI

11

MassSpectrum Analyzer: An interactive platform for proteomic searching parameter refinement and peptide modification focused re-scoring

Karlic, K. I.; Scott, N. E.

2026-06-28 bioinformatics 10.64898/2026.06.22.733873 medRxiv

Top 0.2%

4.0%

Show abstract

Peptide spectrum annotation is critical for the assignment of peptides and the localisation of modifications. While many existing tools provide spectrum annotation capacities, they often lack the flexibility required to allow bespoke spectral annotation of peptides containing multiple labile modifications or the accurate assignment of peptides in which fragmentation deviates from canonical patterns. In these cases, user-guided annotation is widely used to improve assignment completeness, however it typically does not integrate peptide scoring, making it challenging to assess the empirical improvement of the associated annotation and its impact on downstream false-discovery rate estimations. Here, we introduce an interactive annotation environment, the 'MassSpectrum Analyzer', which aims to streamline the exploration and analysis of modified peptides by enabling user-defined customisation with peptide scoring. Using (2-Aminoethyl)trimethylammonium carboxyl-derivatised peptides and glycopeptides as case studies we demonstrate the capacity of the MassSpectrum Analyzer to rapidly explore and allow the assessment of modified peptide datasets. By enabling direct assessment of the impact of user-guided choices on peptide scoring, we show how the detection of highly modified peptides can be improved through post-search integration of modification fragmentation information in a statistically robust manner. Similarly, by permitting comparisons of peptide ion intensities across spectra, we show that global fragmentation patterns can be quantified allowing the interrogation of trends that only become clear when spectra are assessed en masse. Combined, the MassSpectrum Analyzer streamlines the generation of publication-ready spectra and provides a means to assess how the inclusion of annotated features influences assignment scores.

12

gamdid: generalized additive models for differential distributions in single cell experiments

Clement, L.; Beerland, L.; Martens, L.; Vanderaa, C.; Vandenbulcke, S.

2026-06-23 genomics 10.64898/2026.06.18.733106 medRxiv

Top 0.2%

3.4%

Show abstract

Single-cell proteomics (SCP) generates protein abundance measurements across hundreds to thousands of individual cells, offering unprecedented resolution to study cellular heterogeneity. However, existing differential abundance (DA) methods are limited to detecting shifts in mean expression, leaving biologically relevant differences in shape undetected. Indeed, the specific power of SCP is to identify differences between individual cells in a population, which are typically only found as shape differences rather than in mean expression. We here therefore present gamdid (generalized additive models for differential distributions), a novel statistical framework and R package for differential distribution (DD) analysis in SCP data. gamdid is based on generalized additive models (GAMs) to flexibly model heterogeneous distributions, perform inference and provide interpretable visualizations. Through semi-synthetic benchmarking on two SCP datasets, gamdid demonstrates conservative false discovery rate control and substantially outperforms competing methods for differences in shape, while achieving comparable performance for mean shifts. A spike-in case study further demonstrates the utility of gamdid and its interpretable visualization. Uniquely among DD methods, gamdid supports omnibus testing across more than two groups, with post-hoc pairwise comparisons via stagewise testing, and is specifically tailored for proteomics abundance data.

13

ProtPen combines sequence- and structure-based approaches to facilitate protein function predictions on a proteome-wide scale

Mathai, D.; Schulze, S.

2026-07-11 bioinformatics 10.64898/2026.07.11.737882 medRxiv

Top 0.2%

3.3%

Show abstract

Proteins of unknown function represent a significant gap in our understanding of biological processes, encompassing large portions of the proteomes of many organisms, especially prokaryotes. Addressing this gap is critical to understanding the biology and pathogenicity of such organisms. We introduce ProtPen, an open-source pipeline that facilitates protein function prediction by combining eggNOG-mapper for sequence-based annotation with Foldseek for rapid structural similarity searches using AlphaFold-predicted protein structures. Annotation results from both tools are merged and enriched with UniProt metadata to produce a comprehensive output suitable for downstream analysis. The pipeline requires only a FASTA input file with UniProt identifiers, and is designed to analyze datasets on the scale of whole proteomes. Benchmarking on a curated dataset of well-characterized Pseudomonas aeruginosa proteins demonstrated an annotation accuracy of >90%, and highlighted the complementarity of sequence- and structure-based methods. Further evaluation of ProtPen included its application to biologically relevant datasets, comprising proteins of unknown function that exhibited significant differential abundances in a proteomics dataset of P. aeruginosa, and uncharacterized glycoproteins from Haloferax volcanii. ProtPen is readily extensible to incorporate additional protein function prediction tools. In summary, this pipeline facilitates the systemwide annotation of proteins of unknown function from proteomic datasets and whole proteomes. For Table of Contents Only O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=98 SRC="FIGDIR/small/737882v1_ufig1.gif" ALT="Figure 1"> View larger version (25K): org.highwire.dtl.DTLVardef@1011179org.highwire.dtl.DTLVardef@1222493org.highwire.dtl.DTLVardef@8f69f2org.highwire.dtl.DTLVardef@174b30e_HPS_FORMAT_FIGEXP M_FIG C_FIG

14

A High Throughput SPR-Based Array for Quantitative Profiling of Glycosaminoglycan Protein Interactions

Jowitt, T. A.; Birchenough, H. L.; Popplewell, J. F.; Dyer, D. P.; Day, A. J.

2026-07-04 biophysics 10.64898/2026.07.02.736113 medRxiv

Top 0.2%

3.2%

Show abstract

Glycosaminoglycans (GAGs) are linear, negatively charged, polysaccharides that mediate a wide variety of biologically critical interactions with proteins, underpinning growth factor signalling, extracellular matrix assembly and numerous disease processes. However, GAG-protein interactions remain under characterised, in part because of the lack of high-throughput tools to systematically profile binding across the GAG interactome. In this paper we present a novel Surface Plasmon Resonance-based array methodology utilising 16 commonly sourced GAG preparations (including chondroitin sulphate (CS), dermatan sulphate (DS), heparan sulphate, heparin, hyaluronan and keratan sulphate) allowing the specificity and affinity of GAG-binding proteins to be determined. As proof of principle, we have validated the array using four established GAG-binding proteins (antithrombin III, CD44, heavy chain 1 from inter--inhibitor and Slit2), generating data consistent with the known binding specificities and quantifying affinities for many of the interactions. The array also reveals previously unreported GAG interactions, including Slit2 binding to CS and DS, and CD44 binding to chondroitin sulphate E.

15

RegulomeXplorer: Interactive exploration of drug effects on subcellularly resolved proteomes

Uiberacker, M.; Iellici, T.; Afanaseva, E.; Meier-Menches, S.; Zanghellini, J.

2026-07-03 bioinformatics 10.64898/2026.06.29.735319 medRxiv

Top 0.3%

2.8%

Show abstract

Mass spectrometry-based proteomics allows the quantification of drug-induced changes in protein abundance. However, the integration of perturbation data across subcellular compartments remains a challenging bottleneck. Here, we present RegulomeXplorer, a web-based tool for automated processing and interactive exploration of subcellular compartment-resolved proteomics data. RegulomeXplorer employs MaxQuant output files to determine differential protein regulations upon drug perturbation, performs functional enrichment analysis, and visualizes enriched terms on a two-dimensional cytoplasmic-nuclear plane, called regulome. The data visualization by means of regulomes allows to simultaneously assess the magnitude of drug perturbation effects within separate subcellular compartments as well as the contribution of regulated proteins to the position of each enriched term in the regulome plane. We validated RegulomeXplorer against previously published, manually curated regulome analyses. It was then applied on subcellular compartment resolved breast cancer cell line proteomes, revealing drug- and cell-line-specific responses to Doxorubicin and Taxol, both in line with their described mode of action. RegulomeXplorer provides an accessible workflow for interpreting compartment-resolved perturbation proteomics and generating mode of action hypotheses in drug-response studies. RegulomeXplorer is freely available without registration at https://chemnettools.anc.univie.ac.at/RegulomeExplorer/.

16

Sequential Penta-Omic Extraction Method Using Single Biospecimens of Post-mortem Human Brain

Lyon, S. P.; Ehrmann, B. M.; Webb, T. S.; Arciniega, C.; Herring, L. E.; Guo, S.; Parnham, S.; Scott, W. K.; Mieczkowski, P. A.; Macdonald, J. M.

2026-06-29 biochemistry 10.64898/2026.06.26.734872 medRxiv

Top 0.3%

2.5%

Show abstract

A multi-omic approach utilizing a single biospecimen is important to avoid intra-sample heterogeneity associated with testing multiple omic single-samples, and for more efficient use of small volumes of precious biopsies (<30 mg). This is especially true for the microanatomy of post-mortem human brain samples. Using post-mortem human brain biospecimens from the NIH NeuroBioBank, a penta-omic sequential extraction method is described, Simultaneous Metabolomic, Proteomic, Lipidomic - DNA, RNA Extraction (SiMPL-DREx). Each sequential omic extract was compared to those obtained by the gold standard single omic method. Preserving RIN is critical for brain and tissue banks, as it is a primary measure of tissue quality. For all five omic extracts, the tissue integrity numbers and omic profiles did not significantly differ from those obtained by the respective omic gold standard method. Unlike past multi-omic studies, this study quantified the relative solvent percentages and upstream losses for both the organic and aqueous phases, confirming an omics loss of under 5%.

17

Curating MitoCore: A Standardized Small-Scale Human Metabolic Model as Platform for Proteomics Integration and Disease Modeling

Lange, E.; Santamaria, A. B. R.; Heyer, R.

2026-07-09 systems biology 10.64898/2026.06.29.734258 medRxiv

Top 0.3%

2.5%

Show abstract

MotivationCentral human metabolism powers cellular processes, yet its dysregulation in disease remains poorly understood. While comprehensive genome-scale metabolic models like Human-GEM are available, their size limits interpretability and computational efficiency. Conversely, the smaller MitoCore model is more manageable but lacks the standardized annotations and curated gene-protein-reaction (GPR) associations necessary for omics integration like protein-constrained modeling. Improving MitoCores annotation quality is therefore essential for its use in integrative workflows. ResultsWe systematically updated MitoCore to enhance compatibility with the protein-constrained modeling framework sMO-MENT. By restructuring legacy annotations and integrating data from Human-GEM and MitoMammal, we increased EC-codes from 354 to 593 and UniProt-annotated genes from 0 to 592. MitoCore captures central metabolic processes, confirmed by mapping its reactions to 51 of 106 metabolic KEGG modules. Integration of thrombocyte proteomics and experimental ATP data for original and curated models showed an increase in mapped proteins (228 to 294) and reactions with kcat values (295 to 310), adding 43 protein-constrained reactions. Consequently, prediction errors for exchange fluxes and ATP production decreased by 19% and 88%, respectively, with 100% of ATP predictions falling within the 95% confidence interval (compared to 16% for the original model). Finally, we implemented a continuous integration/continuous deployment pipeline for automated updates from future Human-GEM releases. These improvements provide a computationally efficient, well-annotated model for studying central metabolism across human cell types. Availability and ImplementationAll source code for reproducing results from this paper is available at https://doi.org/10.5281/zenodo.20813825.

18

Kinetic Lipidomics: Quantifying in vivo changes in lipid metabolism using metabolic labeling

Nielsen, C.; Denton, R.; Driggs, B.; Gates, S.; Hilton, T.; Naylor, B.; Quilling, C.; Virgin, K.; Cutler, K.; Sorensen, M.; Poulson, M.; Snedaker, P.; Hernandez, Z.; Transtrum, M.; Price, J. C.

2026-07-01 biochemistry 10.64898/2026.06.29.735310 medRxiv

Top 0.3%

2.4%

Show abstract

Lipid metabolism reflects the dynamic balance between metabolic turnover and concentration. Kinetic mass spectrometry (MS) enables direct quantification of molecular turnover in vivo. Previous work has shown that MS-based kinetic proteomics has provided powerful insights into proteome regulation. Analogous lipidome-wide kinetic measurements remain limited by challenges in defining molecule-specific labeling behavior. Here, we extend kinetic MS to untargeted lipidomics. Isotope labeling with deuterated water (2H2O) is commonly used for monitoring turnover of palmitate and other select lipids by measuring labeling of stable CH positions with deuterium (2H). Here, we extend the deuterium-incorporation model underlying these targeted lipid turnover assays to support untargeted analysis of all detectable lipids. This allows us to empirically quantify the effective fraction of endogenous synthesis (Asyn) and the turnover rate (k) across hundreds of lipid species simultaneously. One central barrier to lipidome-wide kinetic modeling is determining the endogenous number of deuterium-labeling sites for each molecule (nL) which is required to estimate Asyn and k accurately. The nL value is an essential component of biological kinetic assays. In kinetic proteomics, curated amino acid nL libraries enable peptide-level modeling by summing sequence-specific labeling-site values, but comparable resources are lacking for lipids and may not generalize across metabolic states or non-mammalian systems. Yet, gaps remain for lipids and for amino acids in modified metabolic conditions or non-mammalian biologies. Here, we empirically determine lipid nL values and validate the process with peptides against an nL library. To evaluate this strategy in a biologically relevant setting, we applied it to brain tissue from transgenic mice expressing human ApoE isoforms, where altered lipid transport and metabolism are implicated in Alzheimers disease risk. These data validate the method in a clinically relevant context and suggest that genotype-dependent metabolism can alter empirically determined lipid nL values.

19

Reproducible-by-design: Romics Processor, a FAIR ecosystem for multi-omics and spatial-omics analysis

Gorman, B. L.; Bhotika, H.; Jehrio, M.; Purkerson, J. M.; Carlin, F.; Nakayasu, E. S.; Misra, R. S.; Adkins, J. N.; Anderton, C. R.; Pryhuber, G.; Clair, G. C.

2026-07-15 bioinformatics 10.64898/2026.07.09.737600 medRxiv

Top 0.3%

2.4%

Show abstract

Multi-omics and spatial-omics technologies are exploding in use, producing increasingly complex datasets. Existing bioinformatics tools are developing rapidly but fail to fully enforce the FAIR principles, leaving the field vulnerable to escalating issues in computational reproducibility. Here, we introduce a reproducible-by-design paradigm represented in an omics data processing package, RomicsProcessor. At its core, the "Romics_object", which is a self-contained digital artifact that encapsulates the full history of the data from the original data to the fully processed state, capturing the details of the transformative steps and the required dependencies. This architecture ensures that computational workflows are fully portable and reproducible. In this manuscript, we demonstrate RomicProcessors computational capabilities and scalability on diverse datasets, including bulk proteomics, large-scale multiplexed immunofluorescence, and multi-batch mass spectrometry imaging. Providing a robust framework for truly FAIR Data Principles-based analysis, RomicsProcessor is a blueprint for the next generation of reproducible bioinformatics tools that can dramatically accelerate discovery in multi-omics biology in the era of artificial intelligence.

20

EnrichViz: An Interactive R Shiny Application for Visualization of Pathway Enrichment Results from Omics Data

Garcia-Milian, R.

2026-06-23 bioinformatics 10.64898/2026.06.19.733398 medRxiv

Top 0.3%

2.4%

Show abstract

Pathway and functional enrichment analysis is a cornerstone of omics data interpretation, enabling researchers to map differentially expressed proteins or genes onto curated biological processes, signaling cascades, and molecular functions. While tools such as Ingenuity Pathway Analysis (IPA), g:Profiler, and Enrichr are widely used to generate ranked enrichment results, translating these tabular outputs into clear, publication-ready figures remains a time-consuming step that typically requires custom scripting and familiarity with visualization libraries -- a significant barrier for researchers without a computational background. Here we present EnrichViz, a self-contained, browser-based R Shiny application that enables interactive, code-free visualization of pathway and functional enrichment results from quantitative proteomics, transcriptomics, and metabolomics experiments. EnrichViz accepts three standard CSV files as input -- a normalized abundance matrix, a sample annotation or metadata file, and enrichment results from any platform that exports tabular output -- and produces six complementary, publication-ready visualizations: bar and bubble plots for ranking enriched terms by significance, chord diagrams for exploring pathway-molecule connectivity, clustered heatmaps for displaying Z-score normalized expression patterns across experimental groups, and boxplots or violin plots for examining the abundance distribution of individual proteins, genes, or metabolites. The application supports both raw p-values and pre-transformed -log10(p) values through automatic detection, and all plot parameters are adjustable in real time through a graphical sidebar. Every figure can be exported as a high-resolution PNG file at 300 dpi. EnrichViz is implemented in R using the Shiny, ggplot2, pheatmap, and circlize packages, and is freely available at https://rgmilian.shinyapps.io/EnrichViz/.