PROTEOMICS — Latest Matching Preprints

1

Assessing extracellular vesicle proteins as predictive biomarkers for developing type 1 diabetes

Dakup, P. P.; Bramer, L.; Schepmoes, A.; Diaz Ludovico, I.; Flores, J.; Mirmira, R.; Webb-Robertson, B.-J.; Metz, T. O.; Sims, E. K.; Nakayasu, E. S.

2026-02-09 systems biology 10.64898/2026.02.06.703600 medRxiv

Top 0.1%

52.5%

Show abstract

Plasma extracellular vesicles (EVs) are considered excellent sources for biomarker discovery since they carry signatures of their cellular origin and disease processes. In this paper, we evaluate the potential of plasma EV proteomics analysis for identifying predictive biomarkers of developing type 1 diabetes (T1D), which results from autoimmune destruction of insulin-producing {beta} cells in the islet. We used strong anion exchange beads (Mag-Net) to capture plasma EVs from 19 donors with islet autoimmunity (diagnosed by circulating autoantibodies against islet proteins - AAB+) vs. 17 control individuals and analyzed their protein cargo by mass spectrometry. The analysis identified and quantified 5,480 proteins, a 3.2-fold increase in proteome coverage compared to our previous T1D biomarker proteomics study that used whole plasma depleted of the 14 most abundant proteins. The Mag-Net approach also detected 1,306 out of the 1,717 proteins (76%) that we previously verified as EV proteins. Statistical tests revealed 448 proteins to be differentially abundant in AAB+ vs control volunteers, including 69 previously verified EV proteins. A functional-enrichment analysis resulted in overrepresentation of 25 pathways among the differentially abundant proteins, including pathways related to autoimmune response and lipid metabolism. The capacity of this data to predict AAB+ was tested with a machine learning analysis using a random forest model, resulting in a receiver operating characteristic-area under the curve of 0.81. Overall, our study indicates that plasma EV proteomics analysis can be an exciting approach for studying biomarkers for developing T1D. Significance of the studyType 1 diabetes (T1D) is a disease characterized by the bodys inability to produce insulin and consequently, to control blood glucose levels. Despite the initial trigger being unclear, the disease development process involves an autoimmune response to the islets of Langerhans, resulting in the death of insulin-producing {beta} cells. There is no cure for the disease, and treatment relies on exogenous administration of insulin. Therefore, preventive therapies that block the autoimmune process are attractive for treating T1D. In fact, anti-CD3 antibody (Teplizumab) delays the onset of T1D by 2 years by targeting T cells. Predictive biomarkers for developing T1D are needed to aid the development and implementation of new therapies and to identify the initial trigger and mechanisms of the islet autoimmune process. In this paper, we assess the potential of plasma extracellular vesicle (EV) proteomics analysis for identifying predictive biomarkers of T1D. Our results show excellent potential of the approach, opening opportunities to perform broader studies to identify biomarkers for developing T1D.

2

Evaluating computational approaches for comparison of protein expression across cancer indications

Wang, J.; Tian, X.; Yu, W.; Pullman, B.; Bullen, J.; Hurt, E.; Zhong, W.

2024-08-27 bioinformatics 10.1101/2024.08.26.609731 medRxiv

Top 0.1%

39.1%

Show abstract

BackgroundThe National Cancer Institutes Clinical Proteomic Tumor Analysis Consortium (CPTAC) recently generated harmonized genomic, transcriptomic, proteomic, and clinical data for over 1,000 tumors across 10 cohorts to facilitate pan-cancer discovery research. However, protein expression comparison across CPTAC cohorts remains challenging due to non-uniform missing data and varying protein expression distribution patterns across tumor types. Here, we present our efforts to evaluate various missing data handling and normalization strategies to create a normalized pan-cancer protein expression dataset. ResultsFirst, we developed a novel algorithm to select robustly expressed proteins in tumors within any CPTAC cohort. Second, we applied a cohort hybrid imputation approach to protein abundance values from FragPipe within each cohort based on protein expression distribution patterns. Third, we calculated intensity-based absolute quantification using protein abundance values and applied both global and smooth quantile normalization methods. Our results indicate that global quantile normalization ensured identical distribution across cohorts for both tumor and normal tissues, while smooth quantile normalization preserved distribution differences between biological conditions. We assessed our method by comparing differential protein expression analysis results with and without normalization. Additionally, we examined the ranks of protein expression in the normalized CPTAC dataset for selected proteins with high protein-to-RNA expression correlation across CPTAC cohorts. We then compared these protein expression ranks with their RNA expression ranks across corresponding cohorts in The Cancer Genome Atlas (TCGA). Differential protein expression analysis revealed a high level of agreement in the fold change of tumor versus normal tissue within cohorts before and after normalization. Furthermore, our results indicate that global quantile normalization resulted in the highest cohort rank correlation between CPTAC and TCGA for selected proteins. ConclusionsIn summary, our thorough analysis demonstrates that global quantile normalization surpasses both smooth quantile normalization and no normalization, as evidenced by its higher rank correlation across cancer cohorts between CPTAC and TCGA for selected proteins. The findings suggest that combining cohort hybrid imputation with global quantile normalization is an effective method for creating a normalized CPTAC pan-cancer protein dataset, which can facilitate the study of protein expression across different cancer types.

3

OncoProExp: An Interactive Shiny Web Application for Comprehensive Cancer Proteomics and Phosphoproteomics Analysis

Sharif Rahmani, E.; Lingasamy, P.; Khojand, S.; Lawarde, A.; Vela Moreno, S.; Salumets, A.; Modhukur, V.

2025-03-10 bioinformatics 10.1101/2025.03.06.641407 medRxiv

Top 0.1%

30.8%

Show abstract

Cancer research has been revolutionized by mass spectrometry (MS)-based proteomics, enabling large-scale profiling of proteins and post-translational modifications (PTMs) to identify critical alterations in cancer signaling pathways. However, the lack of comprehensive, userfriendly platforms for integrative analysis limits efficient data exploration, biomarker identification, and translational insights. To address this gap, we developed OncoProExp, a Shiny-based interactive web application designed for in-depth exploration of cancer proteomes and phosphoproteomes. OncoProExp offers robust workflows for data preprocessing, including missing value imputation and statistical filtering. The platform features interactive visualizations such as principal component analysis (PCA), hierarchical clustering heatmaps, and gene set enrichment analysis (GSEA), enabling detailed functional annotation. Differential expression analysis to identify differentially expressed proteins (DEPs) and phosphoproteins (DEPPs) facilitating the discovery of potential biomarkers and therapeutic targets. The application supports survival analysis and pan-cancer exploration using clinical and proteome/phosphoproteomic datasets. OncoProExp incorporates state-of-the-art predictive modeling using machine learning algorithms, including Support Vector Machines (SVMs), Random Forests, and Artificial Neural Networks (ANNs) for cancer risk stratification, achieving near-perfect accuracy in multi-cancer and single-cancer classification. These models are enhanced by SHapley Additive exPlanations (SHAP) for interpretability. To enhance its translational utility, the platform supports user-uploaded data and enables protein-protein interaction analysis, pathway enrichment analysis, cancer drug relevance evaluation, and clinical annotation using curated cancer-specific datasets. OncoProExp is deployable via Docker containers, ensuring flexible and scalable integration into individual servers. Its utility has been demonstrated using Clinical Proteomic Tumor Analysis Consortium (CPTAC) datasets, showcasing its potential to advance cancer biomarker discovery, risk stratification, therapeutic target identification, and personalized treatment strategies. OncoProExp is freely accessible at https://oncopro.cs.ut.ee/ without login requirements, offering a comprehensive resource for translational cancer research.

4

ProHap enables proteomic database generation accounting for population diversity

Vasicek, J.; Kuznetsova, K. G.; Skiadopoulou, D.; Johansson, S.; Njolstad, P. R.; Bruckner, S.; Kall, L.; Vaudel, M.

2023-12-24 bioinformatics 10.1101/2023.12.24.572591 medRxiv

Top 0.1%

26.4%

Show abstract

Amid the advances in genomics, the availability of large reference panels of human haplotypes is key to account for human diversity within and across populations. However, mass spectrometry-based proteomics does not benefit from this information. To address this gap, we introduce ProHap, a Python-based tool that constructs protein sequence databases from phased genotypes of reference panels. ProHap empowers researchers to account for haplotypic diversity in proteomic searches.

5

Time-dependent changes to sepsis-specific networks in the plasma proteome are mechanistic readouts of sepsis progression.

Pimienta, G.

2020-09-09 biochemistry 10.1101/2020.09.08.285221 medRxiv

Top 0.1%

26.3%

Show abstract

Sepsis accounts for 1 in 5 deaths globally and is the most common cause of deaths in U.S. hospitals. Despite this public health burden, no diagnostic biomarker, nor therapeutic agent for sepsis has proven useful or effective. The principal obstacle is the lack of a mechanistic understanding of this syndrome, particularly during its onset and progression. Using an experimental model of murine sepsis, we report here a time-dependent assessment of changes to the plasma proteome upon infection with Salmonella enterica serovar Typhimurium. Changes to the plasma proteome signature of sepsis (PPSS) revealed a transition from early inflammation and coagulation to a later stage of chronic inflammation, coagulopathy and bacteremia. This study represents an advance in our understanding of sepsis progression that may guide innovative therapeutic attitudes and help clinicians track sepsis progression.

6

Explainable machine learning for the identification of proteome states via the data processing kitchen sink

Scott, A. M.; Hartman, E.; Malmstroem, J.; Malmstroem, L.

2023-08-31 bioinformatics 10.1101/2023.08.30.555506 medRxiv

Top 0.1%

26.1%

Show abstract

The application of machine learning algorithms to facilitate the understanding of changes in proteome states has emerged as a promising methodology in proteomics research. Unfortunately, these methods can prove difficult to interpret, as it may not be immediately obvious how models reach their predictions. We present the data processing kitchen sink (DPKS) which provides reproducible access to classic statistical methods and advanced explainable machine learning algorithms to build highly accurate and fully interpretable predictive models. In DPKS, explainable machine learning methods are used to calculate the importance of each protein towards the prediction of a model for a particular proteome state. The calculated importance of each protein can enable the identification of proteins that drive phenotypic change in a data-driven manner while classic techniques rely on arbitrary cutoffs that may exclude important features from consideration. DPKS is a free and open source Python package available at https://github.com/InfectionMedicineProteomics/DPKS.

7

Leveraging the Human Panproteome to Enhance Peptide and Protein Identification in Proteomics and Metaproteomics

Canderan, J.; Yuan, R.; Tang, H.; Ye, Y.

2024-11-26 bioinformatics 10.1101/2024.11.25.625239 medRxiv

Top 0.1%

25.4%

Show abstract

In this paper, we developed a novel approach to utilize the human pangenome to improve peptide and protein identification from proteomic data (MS/MS spectra). We propose a new data structure called panproteome graph (PPG), in which nodes are tryptic peptides, to represent the human pangenome. The PPG can be built in linear time and can be utilized via graph traversal using a depth-first search algorithm to generate potential peptides for peptide identification in proteomics. The PPG built using the 47 human proteomes from the Human Pangenome Reference Consortium (HPRC) coupled with UniProt human proteins resulted in more than 4.2M tryptic peptides, a 26% increase as compared to when only the UniProt proteins were included. Graph-based analysis of the PPG revealed a giant disconnected component with about 3M nodes, suggesting substantial sharing of tryptic peptides among proteins. We applied tryptic peptides derived from PPG to characterize three collections of human proteomic and metaproteomic datasets, and our results showed that by exploiting the human pangenome, we were able to increase the number of identified peptides on all datasets we tested (about 8% increase across all three collections). We also showed that using more complete human proteome would be useful for reducing potential misidentification of human peptides as microbial peptides, a problem that was previously studied but based on genomic sequencing data. Our tool for building PPG is available in a GitHub repo PPGpep, and PPG-derived tryptic peptides can be utilized by MetaProD, a pipeline for both human and bacterial peptide and protein identification from (meta)proteomics datasets.

8

Measuring lactulose and mannitol levels using liquid chromatography coupled with tandem mass spectrum: application to clinical study of intestinal epithelium barrier function

Magalhaes, L. M. C.; Rodrigues, F. A. d. P.; Filho, J. Q.; Gondim, R. N. D. G.; Ribeiro, S.; Sousa, J. K.; Clementino, M.; Maciel, B. L. L.; Havt, A.; Santos, A. A. d.; Magalhaes, P. J. C.; Lima, A. A. M.

2022-10-14 pharmacology and therapeutics 10.1101/2022.10.11.22280641 medRxiv

Top 0.1%

24.0%

Show abstract

Lactulose and mannitol have been used to assess intestinal permeability and several methodologies have been used. ObjectivesThis study aimed to validate the high-performance liquid chromatography method coupled with tandem mass spectrometry to measure mannitol and lactulose sugars. Material and MethodsWe used a high-performance liquid chromatography (HPLC) system coupled to an ABsciex Q-TRAP 5500 triple quadrupole mass spectrometer (MS/MS) with an ABSciex Electro Nebulization Interface (ESI) (Framingham, MA, USA). For the separation of lactulose and mannitol compounds in the HLPC, the analytical column HILIC-ZIC(R) from ES Industries (West Berlin, USA) was used. The parameters analyzed for analytical validation were specificity/selectivity, linearity, LD, LQ accuracy, precision (repeatability and intermediate precision) and matrix effect. ResultsThe accuracy was demonstrated from the recovery at three concentration levels (100, 500 and 1000 ng/mL) and in triplicate, which showed recovery values above the recommended (>120%). Intermediate precision was determined at 24-hour intervals and the coefficients of variation found were less than 8.7%. The matrix effect was measured through the retention times in the standard samples and in the samples of the spiked standards in dilutions with urine samples, which varied between 99.3% and 100.3%. Urine samples from malnourished and healthy children were analyzed. The L:M ratio was considerably lower in the control group compared to the MN group (p<0.0001) and the mannitol excretion rate was higher (p<0.0001). ConclusionsThe results showed that the HPLC-MS/MS method was sensitive, specific, and accurate for the determination of molecular biomarkers of lactulose and mannitol. In addition, the L:M test is a functional test capable of determining with high sensitivity the barrier function damage of the intestinal epithelium in children with malnutrition compared to health control children.

9

Detecting predicted cancer-testis antigens in proteomics datasets of healthy and tumoral samples

Machado, K. C. T.; Fiuza, T. D. S.; De Souza, S. J.; De Souza, G. A.

2024-06-09 bioinformatics 10.1101/2024.06.08.597624 medRxiv

Top 0.1%

23.1%

Show abstract

Biomarkers are molecular markers found in clinical samples which may aid disease diagnosis or prognosis. High-throughput techniques allow prospecting for such signature molecules by comparing gene expression between normal and sick cells. Cancer-testis antigens (CTAs) are promising candidates for cancer biomarkers due to their limited expression to the testis in normal conditions versus their aberrant expression in various tumors. CTAs are routinely identified by transcriptomics, but a comprehensive characterization of their protein levels in different tissues is still necessary. Mass spectrometry-based proteomics allows the characterization of many cellular types and the production of large amounts of data while computational tools allow the comparison of multiple datasets, and together those may corroborate insights obtained at the transcriptomic level. Here a computational meta-analysis explores the CTAs protein abundance in the proteomic layer of healthy and tumor tissues. The combined datasets present the expression patterns of 17,200 unique proteins, including 241 known CTAs previously described at the transcriptomic level. Those were further ranked as significantly enriched in tumor tissues (22 proteins), exclusive to tumor tissues (42 proteins) or abundant in healthy tissues (32 proteins). This analysis illustrates the possibilities for tumor proteome characterization and the consequent identification of biomarker candidates and/or therapeutic targets.

10

High resolution, proteome-wide mapping of subcellular protein localization in plants

van Schie, M.; Roosjen, M.; Albrecht, C.; van Marsdijk, J.; Weijers, D.

2026-03-02 plant biology 10.64898/2026.02.27.708449 medRxiv

Top 0.1%

22.6%

Show abstract

Protein function is intimately connected to subcellular localization, and experimental determination of protein localization is a key element of understanding biological roles. However, even in the best-studied model plants, such as Arabidopsis thaliana, a minority of proteins has an experimentally defined subcellular localization. We present an experimental strategy to globally map plant subcellular proteomes by mass spectrometry. We annotated subcellular localization of 7815 proteins in Arabidopsis roots, 4672 in Arabidopsis seedlings, and 2782 in the liverwort Marchantia polymorpha. By independent validation, we find that these annotations are highly predictive and can be integrated with other proteomics datasets. Cross-species comparisons reveal substantial global conservation of subcellular localization. Furthermore, we demonstrate that the same approach can be used to identify dynamically translocating proteins upon treatment or in a mutant. This work shows the power of global spatial proteome mapping in plants and offers an extensive resource for protein subcellular localization in plants. HighlightsO_LIOptimized approach for global mapping of protein subcellular localization by differential centrifugation in plants C_LIO_LIInteractive resource of subcellular localization of plant proteins at unprecedented depth and resolution C_LIO_LICross-species comparison reveals that the plant subcellular proteome is deeply conserved C_LIO_LIComparative subcellular proteomics of a Brefeldin A treatment and a gnom mutant robustly describes global shifts in protein localization C_LI

11

SPROUTS_DB: an implemented database of contaminants for extracellular vesicle proteomics studies

Pittala, M. G. G.; Leggio, L.; Paterno, G.; Giusto, E.; Civiero, L.; Cunsolo, V.; Vivarelli, S.; Di Francesco, A.; Alpi, E.; Saletti, R.; Iraci, N.

2025-05-21 cell biology 10.1101/2025.05.20.655024 medRxiv

Top 0.1%

19.7%

Show abstract

BackgroundCurrent proteomics techniques allow rapid identification and quantification of proteins within any given biological source. In particular, nanoUHPLC/High-Resolution nanoESI-MS/MS enables the characterization of proteins in complex biological samples due to its high sensitivity, accuracy, and scalability. However, LC-MS/MS proteomics might still be susceptible to laboratory and sample-associated contaminants, which can significantly compromise the quality and reliability of data. Therefore, an accurate identification and annotation of such contaminants is crucial for the development of robust proteomics databases and spectral-libraries related search engines. This approach is of special interest in the field of secretome and extracellular vesicles (EVs), membrane-enclosed nanostructures that contain a variety of proteins crucial for cell-to-cell communication and translational applications. ResultsWhen working in ex vivo/in vitro settings, proteins from fetal bovine serum (FBS), commonly employed in standard cell culture media, may interfere with the proteome analysis. To address this issue, we conceived and designed SPROUTS_DB, Serum Protein Repository Of Unwanted Target(ed) Sequences DataBase, a dedicated resource to catalog serum-derived contaminants. Starting from media supplemented with EV-depleted FBS, we simulated cell growth conditions - in the absence of cells - followed by ultracentrifugation. LC-MS/MS analysis of these samples resulted in the identification of a novel set of 1,288 contaminant proteins, which has been deposited in the ProteomeXchange repository (identifier PXD044137). SPROUTS_DB contains primarily soluble proteins, mainly related to the Gene Ontology categories Extracellular Region and Extracellular Space, in line with the nature of the starting sample. In contrast, only a small fraction of the contaminants is classified as membrane-associated proteins, supporting the limited vesicle contamination in the complete medium, due to the use of EV-depleted FBS. Of note, we demonstrated that SPROUTS_DB outperforms existing contaminants databases, ensuring that only peptide spectra relevant to the examined sample are retained and identified as true positive data. ConclusionsConsidering that even proteins from phylogenetically distant organisms share extensive stretches of sequences, SPROUTS_DB is designed to discern contaminants from real sample proteins of interest, minimizing false positive identifications. To the best of our knowledge, SPROUTS_DB is the most updated database of contaminants useful for proteomics investigations of cellular secretomes and EV-containing samples.

12

Comprehensive Proteomic Quantification of Bladder Stone Progression in a Cystinuric Mouse Model Using Data-Independent Acquisitions

Rose, J.; Basisty, N.; Zee, T.; Wehrfritz, C.; Bose, N.; Desprez, P.-Y.; Kapahi, P.; Stoller, M.; Schilling, B.

2021-04-06 cell biology 10.1101/2021.04.06.438573 medRxiv

Top 0.1%

19.7%

Show abstract

Cystinuria is one of various disorders that cause biomineralization in the urinary system, including bladder stone formation in humans. It is most prevalent in children and adolescents and more aggressive in males. There is no cure, and only limited disease management techniques help to solubilize the stones. Recurrence, even after treatment, occurs frequently. Other than a buildup of cystine, little is known about factors involved in the formation, expansion, and recurrence of these stones. This study sought to define the growth of bladder stones, guided by micro-computed tomography imaging, and to profile dynamic stone proteome changes in a cystinuria mouse model. After bladder stones developed in vivo, they were harvested and separated into four developmental stages (sand, small, medium and large stone), based on their size. Data-dependent and data-independent acquisitions allowed deep profiling of stone proteomics. The proteomic signatures and pathways illustrated major changes as the stones grew. Stones initiate from a small nidus, grow outward, and show major enrichment in ribosomal proteins and factors related to coagulation and platelet degranulation, suggesting a major dysregulation in specific pathways that can be targeted for new therapeutic options.

13

Temporal variation in lymphocyte proteomics

McCown, M. A.; Allen, C.; Machado, D. D.; Boekweg, H.; Liang, Y.; Nwosu, A. J.; Kelly, R. T.; Payne, S. H.

2021-07-30 bioinformatics 10.1101/2021.07.29.454362 medRxiv

Top 0.1%

19.5%

Show abstract

Chronic Lymphocytic Leukemia (CLL) is a slow progressing disease, characterized by a long asymptomatic stage followed by a symptomatic stage during which patients receive treatment. While proteomic studies have discovered differential pathways in CLL, the proteomic evolution of CLL during the asymptomatic stage has not been studied. In this pilot study, we show that by using small sample sizes comprising ~145 cells, we can detect important features of CLL necessary for studying tumor evolution. Our small samples are collected at two time points and reveal large proteomic changes in healthy individuals over time. A meta-analysis of two CLL proteomic papers showed little commonality in differentially expressed proteins and demonstrates the need for larger control populations sampled over time. To account for proteomic variability between time points and individuals, large control populations sampled at multiple time points are necessary for understanding CLL progression. Data is available via ProteomeXchange with identifier PXD027429.

14

Implementation and Evaluation of Support Vector Machine-Based Models for Cancer Detection Using Multi-Omic Data: A Systematic Review

Mohamadi, Z.; Abtahi, E.; Shayegh, Z. S.; Ataei Kachouei, M.; Fakhar, A.; Shirani, M. M.; Malekian, M.; Zinatshoar, A.; Biglari, M.; Rezaei, F.; ZarinKhat, A.; Mohammadi, R.

2025-07-11 cell biology 10.1101/2025.07.10.664049 medRxiv

Top 0.1%

19.5%

Show abstract

IntroductionCancer is a major source of mortality and morbidity all over the world that has caused more than 19 million new cases and nearly 10 million deaths in 2020. Although there are so many advances in cancer diagnosis, previous methods such as imaging and serum biomarkers more often lack the necessary sensitivity and specificity, particularly for early-stage detection. However, most of the studies depend on internal validation that increases concerns about the generalizability of these findings. To improve the dependability of SVM applications in clinical fields, the review emphasizes the necessity of external validation and established techniques. Due to all the things mixing AI with omics technology suggests a hopeful way to improve cancer detection, that could end up in better results and more affordable medical treatments. MethodThis systematic review was conducted using the PRISMA2020 principles and registered on The Open Science Framework. A comprehensive search of several databases was conducted, including PubMed/MEDLINE, Scopus, Google Scholar, and Web Of Science. Data was screened using RAYYAN.ai, which uses artificial intelligence methods to help with decision-making and screening. All original English-language studies that employed SVM to build a model for diagnosing a type human malignancy were included. The full text of the articles was extracted, and the quality of the articles and risk of bias were assessed using the PROBAST tool. ResultA total of 104 studies were identified, of which 99 articles have been included after 5 were excluded because full text was unavailable. The studies covered various types of omics, such as proteomics (41 studies), transcriptomics (30 studies), genomics (19 studies), metabolomics (11 studies), epigenomics (4 studies), radiomics (2 studies), immunomics (1 study), and multi-omics (8 studies). 63 studies were internally validated, and 29 were externally validated; however, 2 studies were both internally and externally validated. ConclusionThe review of 99 studies on Support Vector Machine-based models highlights their potential in improving cancer diagnosis. The study emphasizes the importance of proteomics studies in understanding tumor biology and developing effective diagnostic methods. However, concerns about their generalizability and trustworthiness in medical settings persist.

15

Assessment of the potential use of VAL-1221 for Lafora disease: MS-based proteomics for the characterization and quantitation of the biotechnological drug in plasma and cerebrospinal fluid

Esposito, E.; Caravelli, A.; Muccioli, L.; Cancellerini, C.; Tappata, M.; Pizzi, E.; Minardi, R.; DEFEAT-LD Study Group, ; Carelli, V.; Vignatelli, L.; Michelucci, R.; Bisulli, F.; Fiori, J.

2025-09-21 pharmacology and therapeutics 10.1101/2025.09.17.25335891 medRxiv

Top 0.1%

19.3%

Show abstract

BackgroundVAL-1221 is a biotechnological fusion protein that combines the Fab portion of a cell-penetrating antibody with recombinant human acid -glucosidase. Originally used for the treatment of Pompe disease, it has since attracted interest for possible repurposing in Lafora disease (LD), an ultra-rare, fatal form of progressive myoclonus epilepsy characterized by the accumulation of polyglucosan aggregates (Lafora bodies, LBs) within the central nervous system (CNS). Given its design, which includes a cell-penetrating domain, VAL-1221 has been hypothesized to cross the blood-brain barrier (BBB) and target pathogenic glycogen deposits within the CNS in LD. This study aimed to investigate the presence of VAL-1221 in plasma and cerebrospinal fluid (CSF) of LD patients and assess its potential to cross the BBB, using high-resolution mass spectrometry coupled with micro-liquid chromatography (microLC-HRMS/MS). MethodsAs part of a compassionate use program, five LD patients received intravenous VAL-1221 (20 mg/kg, every other week). LD untreated patients were included as controls. Plasma samples were collected at multiple time points up to 24 hours post-infusion, and CSF samples were obtained based on concentration profiles. Untargeted-to-targeted bottom-up proteomics were used to detect a unique peptide tag in biological fluids. Method validation included assessments of precision, accuracy, matrix effects, and analyte stability. ResultsVAL-1221 was consistently detected in plasma up to 4 hours post-infusion, while no VAL-1221 was detected in CSF samples with the methods limit of detection. The validated method showed high sensitivity, precision (RSD [≤]15%), accuracy (RE [≤]15%), and acceptable matrix effect. Recovery was optimal for CSF; it was low in plasma (Rec% > {+/-}20). ConclusionVAL-1221 was reliably detected in plasma after infusion, but no measurable levels were observed in CSF based on the validated methods sensitivity. These findings suggest that the drug, when administered intravenously, may not reach the central nervous system, indicating that this route may not be appropriate for efficacy.

16

The Profiling of Bisecting N-acetylglucosamine (GlcNAc) Modification in Human Amniotic Membrane by Glycomic and Glycoproteomic Analyses

Chen, Q.; zhang, y.; Zhang, K.; Liu, J.; Pan, H.; Wang, X.; Li, S.; Hu, D.; Lin, Z.; Zhao, Y.; Hou, G.; Guan, F.; Li, H.; Liu, S.; Ren, Y.

2020-06-09 cell biology 10.1101/2020.06.09.141168 medRxiv

Top 0.1%

19.2%

Show abstract

It is acknowledged that the bisecting N-acetylglucosamine (GlcNAc) structure, a GlcNAc linked to the core {beta}-mannose residue via a {beta}1,4 linkage, represents a special type of N-glycosylated modification and has been reported to be involved in various biological processes, such as cell adhesion and fetal development. Clark et al. has found that the majority of N-glycans in human trophoblasts bearing a bisecting GlcNAc. This type of glycan has been reported to help trophoblasts get resistant to natural killer (NK) cell-mediated cytotoxicity, and this would provide a possible explanation for the question how could the mother nourish a fetus within herself without rejection. Herein, we hypothesized that human amniotic membrane which is the last barrier for the fetus may also express bisecting type glycans to protect the fetus. To test this hypothesis, glycomic analysis of human amniotic membrane was performed, and the bisecting N-glycans with high abundance were detected. In addition, we re-analyzed our proteomic data with high fractionation and amino acid sequence coverage from human amniotic membrane, which had been released for the exploration of human missing proteins. The presence of bisecting GlcNAc peptides was revealed and confirmed. A total of 41 glycoproteins with 43 glycopeptides were found to possess a bisecting GlcNAc, 25 of which are for the first time to be reported to have this type of modification. These results provide the profiling of bisecting GlcNAc modification in human amniotic membrane and benefit to the function studies of glycoproteins with bisecting GlcNAc modification and the function studies in immune suppression of human placenta. The mass spectrometry placenta data are available via ProteomeXchange (PXD010630).

17

Community Resource: A Genome-Based Extension of Large-Scale Wheat Proteogenomics

Vincent, D.; Appels, R.

2026-07-08 plant biology 10.64898/2026.06.17.733048 medRxiv

Top 0.1%

19.1%

Show abstract

Bread wheat (Triticum aestivum L.) possesses a large and highly repetitive allohexaploid genome and annotation requires extensive protein-level validation. We developed a genome-based wheat proteogenomics workflow integrating large-scale MS/MS reanalysis, GFF3-based peptide coordinate reconstruction, thorough validation, and genome browser-compatible peptide deployment against the IWGSC RefSeq v2.1 reference genome. Public wheat proteomics datasets comprising 577 raw mass spectrometry files ([~]1.0 TB) from 32 tissues were reprocessed using FragPipe/MSFragger, generating 2,226,779 non-redundant peptides and 1,648,740 unique protein accessions. Peptide-to-genome projections using GFF3 annotation files produced 8,291,056 genomic peptide projected rows, of which 98.14% passed validation procedures. Overall, peptide evidence supported 103,095 high-confidence (HC) and 135,495 low-confidence (LC) wheat gene models, corresponding to 96.4% and 84.7% of all parsed HC and LC annotations, respectively. In total, 238,590 wheat gene models (89.4% of all parsed annotations) received protein-level support. Apollo/JBrowse-compatible BED tracks enabled exon-resolved visualisation of peptide evidence across wheat chromosomes. Together, this study establishes a scalable GFF3-based proteogenomics framework for complex polyploid plant genomes and provides an extensive community resource for wheat genome annotation refinement and visual exploration (https://bread-wheat-um.genome.edu.au/apollo/49826/jbrowse/index.html). Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=63 SRC="FIGDIR/small/733048v2_ufig1.gif" ALT="Figure 1"> View larger version (16K): org.highwire.dtl.DTLVardef@6e797org.highwire.dtl.DTLVardef@14ea4fdorg.highwire.dtl.DTLVardef@31f027org.highwire.dtl.DTLVardef@8d908a_HPS_FORMAT_FIGEXP M_FIG C_FIG

18

A deep audit of the PeptideAtlas database uncovers evidence for unannotated coding genes and aberrant translation

Rodriguez, J. M.; Maquedano, M.; Cerdan-Velez, D.; Calvo, E.; Vazquez, J.; Tress, M. L.

2024-11-15 genomics 10.1101/2024.11.14.623419 medRxiv

Top 0.1%

19.1%

Show abstract

The human genome has been the subject of intense scrutiny by experimental and manual curation projects for more than two decades. Novel coding genes have been proposed from large-scale RNASeq, ribosome profiling and proteomics experiments. Here we carry out an in-depth analysis of an entire proteomics database. We analysed the proteins, peptides and spectra housed in the human build of the PeptideAtlas proteomics database to identify coding regions that are not yet annotated in the GENCODE reference gene set. We find support for hundreds of missing alternative protein isoforms and unannotated upstream translations, and evidence of cross-contamination from other species. There was reliable peptide evidence for 34 novel unannotated open reading frames (ORFs) in PeptideAtlas. We find that almost half belong to coding genes that are missing from GENCODE and other reference sets. Most of the remaining ORFs were not conserved beyond human, however, and their peptide confirmation was restricted to cancer cell lines. We show that this is strong evidence for aberrant translation, raising important questions about the extent of aberrant translation and how these ORFs should be annotated in reference genomes.

19

Urine-HILIC: Automated sample preparation for bottom-up urinary proteome profiling in clinical proteomics

Govender, I. S.; Mokoena, R. J.; Stoychev, S. H.; Naicker, P.

2023-07-28 biochemistry 10.1101/2023.07.27.550780 medRxiv

Top 0.1%

19.1%

Show abstract

Urine provides a diverse source of information related to a patients health status and is ideal for clinical proteomics because of its ease of collection. To date, there is no standard operating procedure for reproducible and robust urine sample preparation for mass spectrometry-based clinical proteomics. To this end, a novel workflow was developed based on an on-bead protein capture, clean up, and digestion without the requirement for processing steps such as precipitation or centrifugation. The workflow was applied to an acute kidney injury (AKI) pilot study. Urine from clinical samples and a pooled sample were subjected to automated sample preparation in a KingFisher Flex magnetic handling station using a novel urine-HILIC (uHLC) approach based on MagReSyn(R) HILIC microspheres. For benchmarking, the pooled sample was also prepared using a published protocol based on an on-membrane (OM) protein capture and digestion workflow. Peptides were analysed by LCMS in data independent acquisition (DIA) mode using a Dionex Ultimate 3000 UPLC coupled to a Sciex 5600 mass spectrometer. Data was searched in Spectronaut 17. Both workflows showed similar peptide and protein identifications in the pooled sample. The uHLC workflow was easier to set up and complete, having less hands-on time than the OM method, with fewer manual processing steps. Lower peptide and protein CV was observed in the uHLC technical replicates. Following statistical analysis, candidate protein markers were filtered, at [≥] 2-fold change in abundance, [≥] 2 unique peptides and [≤] 1% false discovery rate, and revealed many significant, differentially abundant kidney injury-associated urinary proteins. The pilot data derived using this novel workflow provides information on the urinary proteome of patients with AKI. Further exploration in a larger cohort using this novel high-throughput method is warranted.

20

High-throughput proteome profiling with low variation in a multi-center study using dia-PASEF

Kaspar-Schoenefeld, S.; Krieger, J. R.; Martelli, C.; Koenig, A.-C.; Hauck, S.; Johansson, S.; Karger, A.; Ohmayer, U.; Pecoraro, M.; Tenzer, S.; Distler, U.; Braga-Lagache, S.; Strohmidel, P.; Abel, L.; Schuster, R.; Kliewer, G.; Kroninger, T.; Heikaus, L.; Assis, D.; Mueller, T.; Hornburg, D.

2024-06-02 systems biology 10.1101/2024.05.29.596405 medRxiv

Top 0.1%

19.0%

Show abstract

High throughput proteomics is gaining increasing traction as it facilitates screening of large sample cohorts required in clinical research and systems biology studies. Recent developments in mass spectrometry-based proteomics resulted in improved hardware and software providing deep proteome coverage, robustness, and scale accessible to a wide range of laboratories. Here, we benchmark dia-PASEF, a data-independent acquisition scheme that integrates trapped ion mobility with high scan speed, with a high-resolution time-of-flight mass analyzer (timsTOF HT) for the deep proteome analysis of a human cell line applying short 5-minute gradients. To show intra-and interlaboratory reproducibility, we performed a multi-laboratory study including 11 sites. We demonstrate that on average 7,072 protein groups and 99,835 peptides were identified in human chronic myelogenous leukemia cells on the timsTOF HT with low variation. Our results underline that dia-PASEF data acquisition combined with reproducible chromatography enables high robustness and data consistency across instruments and laboratories, which is a prerequisite for translational biomedical insights.