PROTEOMICS
○ Wiley
All preprints, ranked by how well they match PROTEOMICS's content profile, based on 35 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
McCracken, N. A.; Liu, H.; Runnebohm, A. M.; Wijeratne, H. S.; Wijeratne, A. B.; Staschke, K. A.; Mosley, A. L.
Show abstract
Thermal Proteome Profiling (TPP) is an invaluable tool for functional proteomics studies that has been shown to discover changes associated with protein-ligand, protein- protein, and protein-RNA interaction dynamics along with changes in protein stability resulting from cellular signaling. The increasing number of reports employing this assay has not been met concomitantly with advancements and improvements in the quality and sensitivity of the corresponding data analysis. The gap between data acquisition and data analysis tools is even more apparent as TPP findings have reported more subtle melt shift changes related to protein post-translational modifications. In this study, we have improved the Inflect data analysis pipeline (now referred to as InflectSSP, available at https://CRAN.R-project.org/package=InflectSSP) to increase the sensitivity of detection for both large and subtle changes in the proteome as measured by TPP. Specifically, InflectSSP now has integrated statistical and bioinformatic functions to improve objective functional proteomics findings from the quantitative results obtained from TPP studies through increasing both the sensitivity and specificity of the data analysis pipeline. To benchmark InflectSSP, we have reanalyzed two publicly available datasets to demonstrate the performance of this publicly available R based program for TPP data analysis. Additionally, we report new findings following temporal treatment of human cells with the small molecule Thapsigargin which induces the unfolded protein response (UPR). InflectSSP analysis of our UPR study revealed highly reproducible target engagement over time while simultaneously providing new insights into the dynamics of UPR induction.
Wang, J.; Tian, X.; Yu, W.; Pullman, B.; Bullen, J.; Hurt, E.; Zhong, W.
Show abstract
BackgroundThe National Cancer Institutes Clinical Proteomic Tumor Analysis Consortium (CPTAC) recently generated harmonized genomic, transcriptomic, proteomic, and clinical data for over 1,000 tumors across 10 cohorts to facilitate pan-cancer discovery research. However, protein expression comparison across CPTAC cohorts remains challenging due to non-uniform missing data and varying protein expression distribution patterns across tumor types. Here, we present our efforts to evaluate various missing data handling and normalization strategies to create a normalized pan-cancer protein expression dataset. ResultsFirst, we developed a novel algorithm to select robustly expressed proteins in tumors within any CPTAC cohort. Second, we applied a cohort hybrid imputation approach to protein abundance values from FragPipe within each cohort based on protein expression distribution patterns. Third, we calculated intensity-based absolute quantification using protein abundance values and applied both global and smooth quantile normalization methods. Our results indicate that global quantile normalization ensured identical distribution across cohorts for both tumor and normal tissues, while smooth quantile normalization preserved distribution differences between biological conditions. We assessed our method by comparing differential protein expression analysis results with and without normalization. Additionally, we examined the ranks of protein expression in the normalized CPTAC dataset for selected proteins with high protein-to-RNA expression correlation across CPTAC cohorts. We then compared these protein expression ranks with their RNA expression ranks across corresponding cohorts in The Cancer Genome Atlas (TCGA). Differential protein expression analysis revealed a high level of agreement in the fold change of tumor versus normal tissue within cohorts before and after normalization. Furthermore, our results indicate that global quantile normalization resulted in the highest cohort rank correlation between CPTAC and TCGA for selected proteins. ConclusionsIn summary, our thorough analysis demonstrates that global quantile normalization surpasses both smooth quantile normalization and no normalization, as evidenced by its higher rank correlation across cancer cohorts between CPTAC and TCGA for selected proteins. The findings suggest that combining cohort hybrid imputation with global quantile normalization is an effective method for creating a normalized CPTAC pan-cancer protein dataset, which can facilitate the study of protein expression across different cancer types.
Gallo, M. C. R.; Li, Q.; Talasila, M.; Uhrig, R. G.
Show abstract
A major limitation when undertaking quantitative proteomic time-course experimentation is the tradeoff between depth-of-analysis and speed-of-analysis. In high complexity and high dynamic range sample types, such as plant extracts, balance between resolution and time is especially apparent. To address this, we evaluate multiple composition voltage (CV) High Field Asymetric Waveform Ion Mobility Spectrometry (FAIMSpro) settings using the latest label-free single-shot Orbitrap-based DIA acquisition workflows for their ability to deeply-quantify the Arabidopsis thaliana seedling proteome. Using a BoxCarDIA acquisition workflow with a -30 -50 -70 CV FAIMSpro setting we are able to consistently quantify >5000 Arabidopsis seedling proteins over a 21-minute gradient, facilitating the analysis of ~42 samples per day. Utilizing this acquisition approach, we then quantified proteome-level changes occurring in Arabidopsis seedling shoots and roots over 24 h of salt and osmotic stress, to identify early and late stress response proteins and reveal stress response overlaps. Here, we successfully quantify >6400 shoot and >8500 root protein groups, respectively, quantifying nearly ~9700 unique protein groups in total across the study. Collectively, we pioneer a short gradient, multi-CV FAIMSpro BoxCarDIA acquisition workflow that represents an exciting new analysis approach for undertaking quantitative proteomic time-course experimentation in plants.
Lazari, L. C.; Azemi, G.; Russo, C.; Fernandes, L. R.; Marie, S. K. N.; Di Ieva, A. C.; Palmisano, G.
Show abstract
Data pre-processing is a critical step in the analysis of MALDI-TOF MS spectra for machine learning applications, typically involving steps such as spectra trimming, baseline correction, smoothing, transformation, and peak picking or spectral binning. While traditional approaches focus on protein/peptide peaks as features, this study explores a novel method of feature extraction by treating MALDI-TOF spectra as time-series data. This study investigates the use of computational fractal-based analysis to assess the complexity of MALDI-TOF spectra. Fractal analysis, previously successful in glioblastoma diagnosis using MRI, was applied here to proteomics data for the first time. By treating each MALDI spectrum as a time series and calculating its fractal dimension using various algorithms, machine learning models were trained to differentiate between glioblastoma patients and healthy controls. We demonstrate that fractals are sufficient to obtain accurate models for glioblastoma diagnosis, despite still underperforming when compared to the traditional feature extraction method. We also show that fractals can be used as support features to increase model performance. This work highlights the potential and limitations of fractal analysis in proteomics, offering a new perspective for disease diagnosis and broadening the applicability of time-series data analysis in mass spectrometry.
Boughter, C. T.
Show abstract
Immunopeptidomics is a growing subfield of proteomics that has the potential to shed new light on a long-neglected aspect of adaptive immunology: a comprehensive understanding of the peptides presented by major histocompatibility complexes (MHC) to T cells. As the field of immunopeptidomics continues to grow and mature, a parallel expansion in the methods for extracting quantitative features of these peptides is necessary. Currently, massive experimental efforts to isolate a given immunopeptidome are summarized in tables and pie charts, or worse, entirely thrown out in favor of singular peptides of interest. Ideally, an unbiased approach would dive deeper into these large proteomic datasets, identifying sequence-level biochemical signatures inherent to each individual dataset and the given immunological niche. This chapter will outline the steps for a powerful approach to such analysis, utilizing the Automated Immune Molecule Separator (AIMS) software for the characterization of immunopeptidomic datasets. AIMS is a flexible tool for the identification of biophysical signatures in peptidomic datasets, the elucidation of nuanced differences in repertoires collected across tissues or experimental conditions, and the generation of machine learning models for future applications to classification problems. In learning to use AIMS, readers of this chapter will receive a broad introduction to the field of protein bioinformatics and its utility in the analysis of immunopeptidomic datasets and other large-scale immune repertoire datasets.
Moreira, R. S.; Filho, V. B.; Maia, G. A.; Soratto, T. A. T.; Kawagoe, E. K.; Russi, B. C.; Miletti, L. C.; Wagner, G.
Show abstract
BackgroundAlthough various tools provide proteomic information, each has its limitations regarding execution platforms, libraries, versions, and data output format. Therefore, integrating data analyses generated using different software programs is a manual process that can prolong the analysis time. ResultsThis paper presents FastProtein, a protein analysis pipeline tool developed in Java. This tool is user-friendly, easily installable, and provides important information regarding the subcellular location, transmembrane domains, signal peptide, molecular weight, isoelectric point, hydropathy, aromaticity, gene ontology, endoplasmic reticulum retention domains, and N- glycosylation domains of a protein. Furthermore, it helps determine the presence of glycosylphosphatidylinositol and obtain annotation information using InterProScan, PANTHER, PFam, and alignment-based annotation searches. Additionally, the software outputs a protein dataset with evidence of membrane localization. ConclusionsThe proposed tool provides the scientific community with an easy and user-friendly computational tool for proteomics data analysis. The tool is applicable to both small datasets and proteome-wide studies. It can be used in either the command line interface mode or through a web interface installed on a local server or via the BioLib web interface (http://biolib.com/UFSC/FastProtein). FastProtein also accelerates proteomics analysis routines by generating multiple results in a one-step run. The software is open-source and freely available. Installation and execution instructions, as well as the source code and test files generated for tool validation, are provided at https://github.com/bioinformatics-ufsc/FastProtein.
Najem, H.; Pacheco, S.; Turunen, J.; Tripathi, S.; Steffens, A.; McCortney, K.; Walshon, J.; Chandler, J.; Stupp, R.; Lesniak, M. S.; Horbinski, C. M.; Winkowski, D.; Kowal, J.; Burks, J. K.; Heimberger, A. B.
Show abstract
Sequential multiplex methodologies such as Akoya CODEX, Miltenyi MACSima, Rarecyte Orion, and others require modification of the antibodies by conjugation to an oligo or a specific fluorophore which means the use of off-the-shelf reagents is not possible. Modifications of these antibodies are typically performed via reduction chemistry and thus require verification and validation post-modification. Fixed panels are therefore developed due to various limitations including spectral overlap that creates spectral unmixing issues, steric hindrance, harsh antibody removal, and tissue degradation throughout the labeling. As such, a complex interrogation evaluating multiple study hypotheses and/or endpoints requires the development of sequential panels, reconstruction, and realignment of the tissue that necessitate a z-stack strategy. Standardized antibody panels are typically fixed and require substantial validation efforts to modify a single target and thus do not evolve with the pace of research interests. To increase the throughput of profiling cells within the human central nervous system (CNS), we developed and validated a CNS-specific library with an associated analysis platform using the newly developed Lunaphore COMETTM platform. The COMETTM is an automated staining/imaging instrument integrating a reagent deck for staining buffers and off-the-shelf label-free primary antibodies and fluorophore-labeled secondary antibodies, which feed into a circular plate holding up to 4 slides that are automatically imaged in microscope-operated control software. For this study, standard formalin fixed paraffin embedded histology slides are used. However, the COMET is capable of imaging fresh-frozen samples using specialized settings. Our methodologies address an unmet need in the neuroscience field while leveraging prior developmental efforts in the domain of immunology spatial profiling. Cataloging and validating a large series of antibodies on the COMET along with developing CNS autofluorescence management strategies while optimizing standard operating procedures have allowed for the visualization at the subcellular level. Forty analytes can be used to analyze one specimen which has clinical utility in cases in which the CNS can only be sampled by biopsy. CNS biopsies, depending on the anatomical location, can have limited available volume to a degree that requires prioritization and restriction to select analysis. In-depth bioinformatic imaging analysis can be done using standard bioinformatic tools and software such as Visiopharm(R). These results establish a general framework for imaging and quantifying cell populations and networks within the CNS while providing the scientific community with standard operating procedures.
Mahmood, M. K.
Show abstract
In various cellular functions, post translational modifications (PTM) of protein play a vital role. The addition of certain functional group through a covalent bond to the protein induces PTM. The number of PTMs are identified which are closely linked with diseases for example cancer and neurological disorder. Hydroxylation is one of the PTM, modified proline residue within a polypeptide sequence. The defective hydroxylation of proline causes absences of ascorbic acid in human which produce scurvy, and many other dominant health issues. Undoubtedly, the prediction of hydroxylation sites in proline residues is of challenging frontier. The experimental identification of hydroxyproline site is quite difficult, high-priced and time-consuming. The diversity in protein sequences instigates to develop a computational tool to identify hydroxylated site within short time with excellent prediction accuracy to handle such proteomics problems. In this work a novel in silico predictor is developed through rigorous mathematical modeling to identify which site of proline is hydroxylated and which site is not? Then performance of the predictor was verified using three validations tests, namely self-consistency test, cross-validation test and jackknife test over the benchmark dataset. A comparison was established for jackknife test with the previous methods. In comparison with previous predictors the proposed tool is more accurate than the existing techniques. Hence this scheme is highly useful and inspiring in contrast to all previous predictors.
Kirsher, D. Y.; Chand, S.; Phong, A.; Nguyen, B.; Szoke, B. G.; Ahadi, S.
Show abstract
Plasma is a rich source of biomolecules, including proteins, that reflect both health and disease. Due to their key roles in biological processes, proteins hold significant potential as biomarkers, fueling the rise of plasma proteome profiling in recent years. Despite widespread adoption, few studies have directly compared different plasma proteomics platforms, particularly those using mass spectrometry. Our study provides a comprehensive comparison of seven platforms across three leading technologies - SomaLogic, Olink, and Mass Spectrometry (MS) - including affinity-based approaches and various MS techniques, covering over 13,000 proteins. By applying these methods to the same cohort, we assess their performance, revealing key differences and complementary strengths. Our findings offer valuable insights for researchers, highlighting trade-offs in coverage and their implications for biomarker discovery and clinical applications. This study serves as an essential resource, offering both technical evaluation and biological insights to support the development of novel diagnostics and therapeutics through plasma proteomics.
Thom, C. S.; Davenport, P.; Fazelinia, H.; Liu, Z.-J.; Zhang, H.; Ding, H.; Roof, J.; Spruce, L. A.; Ischiropoulos, H.; Sola-Visner, M.
Show abstract
Background and ObjectiveRecent clinical studies have shown that transfusions of adult platelets increase morbidity and mortality in preterm infants. Neonatal platelets are hyporesponsive to agonist stimulation, and emerging evidence suggests developmental differences in platelet immune functions. This study was designed to compare the proteome and phosphoproteome of resting adult and neonatal platelets. MethodsWe isolated resting umbilical cord blood-derived platelets from healthy full term neonates (n=9) and resting blood platelets from healthy adults (n=7), and compared protein and phosphoprotein contents using data independent acquisition mass spectrometry. ResultsWe identified 4745 platelet proteins with high confidence across all samples. Adult and neonatal platelets clustered separately by principal component analysis. Adult platelets were significantly enriched for immunomodulatory proteins, including {beta}2 microglobulin and CXCL12, whereas neonatal platelets were enriched for ribosomal components and proteins involved in metabolic activities. Adult platelets were enriched for phosphorylated GTPase regulatory enzymes and proteins participating in trafficking, which may help prime them for activation and degranulation. Neonatal platelets were enriched for phosphorylated proteins involved in insulin growth factor signaling. ConclusionsUsing state-of-the-art mass spectrometry, our findings expanded the known neonatal platelet proteome and identified important differences in protein content and phosphorylation compared with adult platelets. These developmental differences suggested enhanced immune functions for adult platelets and presence of a molecular machinery related to platelet activation. These findings are important to understanding mechanisms underlying key platelet functions as well as the harmful effects of adult platelet transfusions given to preterm infants.
Zlobina, K.; Gopinath, A.; Thubagere, A.
Show abstract
Classification of proteomic samples from sick and non-sick individuals is important for developing high-quality diagnostics of diseases. Creating a shortlist of proteins useful for diagnostics is challenging. In this manuscript a simple algorithm of creating a multidimensional biomarker of health based on scanning proteomics data is provided. The algorithm is applied to several existing publicly available datasets and demonstrates a 9-protein indicator of atopic dermatitis, and 6-protein indicator of heart failure.
Machado, K. C. T.; Fiuza, T. D. S.; De Souza, S. J.; De Souza, G. A.
Show abstract
Biomarkers are molecular markers found in clinical samples which may aid disease diagnosis or prognosis. High-throughput techniques allow prospecting for such signature molecules by comparing gene expression between normal and sick cells. Cancer-testis antigens (CTAs) are promising candidates for cancer biomarkers due to their limited expression to the testis in normal conditions versus their aberrant expression in various tumors. CTAs are routinely identified by transcriptomics, but a comprehensive characterization of their protein levels in different tissues is still necessary. Mass spectrometry-based proteomics allows the characterization of many cellular types and the production of large amounts of data while computational tools allow the comparison of multiple datasets, and together those may corroborate insights obtained at the transcriptomic level. Here a computational meta-analysis explores the CTAs protein abundance in the proteomic layer of healthy and tumor tissues. The combined datasets present the expression patterns of 17,200 unique proteins, including 241 known CTAs previously described at the transcriptomic level. Those were further ranked as significantly enriched in tumor tissues (22 proteins), exclusive to tumor tissues (42 proteins) or abundant in healthy tissues (32 proteins). This analysis illustrates the possibilities for tumor proteome characterization and the consequent identification of biomarker candidates and/or therapeutic targets.
Vincent, D.; Bui, A.; Ram, D.; Ezernieks, V.; Shahinfar, S.; Luke, T.; Rochfort, S.; Rigas, N.; Panozzo, J.; Daetwyler, H.; Hayden, M. J.
Show abstract
Late maturity alpha-amylase (LMA) is a wheat genetic defect causing the synthesis of high isoelectric point (pI) alpha-amylase in the aleurone as a result of a temperature shock during mid-grain development or prolonged cold throughout grain development leading to an unacceptable low falling numbers (FN) at harvest or during storage. High pI alpha-amylase is normally not synthesized until after maturity in seeds when they may sprout in response to rain or germinate following sowing the next seasons crop. Whilst the physiology is well understood, the biochemical mechanisms involved in grain LMA response remain unclear. We have employed high-throughput proteomics to analyse thousands of wheat flours displaying a range of LMA values. We have applied an array of statistical analyses to select LMA-responsive biomarkers and we have mined them using a suite of tools applicable to wheat proteins. To our knowledge, this is not only the first proteomics study tackling the wheat LMA issue, but also the largest plant-based proteomics study published to date. Logistics, technicalities, requirements, and bottlenecks of such an ambitious large-scale high-throughput proteomics experiment along with the challenges associated with big data analyses are discussed. We observed that stored LMA-affected grains activated their primary metabolisms such as glycolysis and gluconeogenesis, TCA cycle, along with DNA- and RNA binding mechanisms, as well as protein translation. This logically transitioned to protein folding activities driven by chaperones and protein disulfide isomerase, as wellas protein assembly via dimerisation and complexing. The secondary metabolism was also mobilised with the up-regulation of phytohormones, chemical and defense responses. LMA further invoked cellular structures among which ribosomes, microtubules, and chromatin. Finally, and unsurprisingly, LMA expression greatly impacted grain starch and other carbohydrates with the up-regulation of alpha-gliadins and starch metabolism, whereas LMW glutenin, stachyose, sucrose, UDP-galactose and UDP-glucose were down-regulated. This work demonstrates that proteomics deserves to be part of the wheat LMA molecular toolkit and should be adopted by LMA scientists and breeders in the future.
Pittala, M. G. G.; Leggio, L.; Paterno, G.; Giusto, E.; Civiero, L.; Cunsolo, V.; Vivarelli, S.; Di Francesco, A.; Alpi, E.; Saletti, R.; Iraci, N.
Show abstract
BackgroundCurrent proteomics techniques allow rapid identification and quantification of proteins within any given biological source. In particular, nanoUHPLC/High-Resolution nanoESI-MS/MS enables the characterization of proteins in complex biological samples due to its high sensitivity, accuracy, and scalability. However, LC-MS/MS proteomics might still be susceptible to laboratory and sample-associated contaminants, which can significantly compromise the quality and reliability of data. Therefore, an accurate identification and annotation of such contaminants is crucial for the development of robust proteomics databases and spectral-libraries related search engines. This approach is of special interest in the field of secretome and extracellular vesicles (EVs), membrane-enclosed nanostructures that contain a variety of proteins crucial for cell-to-cell communication and translational applications. ResultsWhen working in ex vivo/in vitro settings, proteins from fetal bovine serum (FBS), commonly employed in standard cell culture media, may interfere with the proteome analysis. To address this issue, we conceived and designed SPROUTS_DB, Serum Protein Repository Of Unwanted Target(ed) Sequences DataBase, a dedicated resource to catalog serum-derived contaminants. Starting from media supplemented with EV-depleted FBS, we simulated cell growth conditions - in the absence of cells - followed by ultracentrifugation. LC-MS/MS analysis of these samples resulted in the identification of a novel set of 1,288 contaminant proteins, which has been deposited in the ProteomeXchange repository (identifier PXD044137). SPROUTS_DB contains primarily soluble proteins, mainly related to the Gene Ontology categories Extracellular Region and Extracellular Space, in line with the nature of the starting sample. In contrast, only a small fraction of the contaminants is classified as membrane-associated proteins, supporting the limited vesicle contamination in the complete medium, due to the use of EV-depleted FBS. Of note, we demonstrated that SPROUTS_DB outperforms existing contaminants databases, ensuring that only peptide spectra relevant to the examined sample are retained and identified as true positive data. ConclusionsConsidering that even proteins from phylogenetically distant organisms share extensive stretches of sequences, SPROUTS_DB is designed to discern contaminants from real sample proteins of interest, minimizing false positive identifications. To the best of our knowledge, SPROUTS_DB is the most updated database of contaminants useful for proteomics investigations of cellular secretomes and EV-containing samples.
Zhong, J.; Wu, J. R.; Zeng, X.; Moran, M.; Ma, B.
Show abstract
Advancements in mass spectrometry (MS)-based proteomics have produced large-scale datasets, necessitating the development of effective tools for peptide identification. Here, we present LooMS, a novel tool specifically designed for identifying peptides in data-independent acquisition (DIA) datasets. LooMS employs an innovative approach, using an unbiased generation strategy for positive and negative samples, which reduces the risk of overfitting in peptide identification with deep learning models. Additionally, LooMS addresses various critical aspects of DIA mass spectra data analysis, constructing a comprehensive set of 43 features for training deep learning models, which cover different stages of DIA data analysis. Notably, we propose a false discovery rate (FDR) control strategy that integrates results from both LooMS and DiaNN, another leading peptide identification tool. Our results demonstrate significant improvements in peptide identification performance, with enhancements of 40.61% and 26.60% at the unique peptide level for human and mouse datasets, respectively. HighlightsO_LILooMS is a novel tool for identifying peptides in DIA datasets that adopts an innovative unbiased positive and negative sample generation strategy, which aim to avoid the overfilling in peptide identification with deep learning model. C_LIO_LILooMS comprehensively considers various aspects of data analysis for DIA mass spectra and builds 43 useful features for training deep learning models, which involve different stages of DIA data analysis. C_LIO_LIA FDR control strategy for integration of results from both LooMS and DiaNN is proposed, which can significantly improve the identification of peptides due to the differences in the features involved in peptide detection during their respective design. C_LI
Scott, A. M.; Karlsson, C.; Mohanty, T.; Vaara, S. T.; Linder, A.; Malmstrom, J.; Malmstrom, L.
Show abstract
The statistical validation of peptide and protein identifications in mass spectrometry proteomics is a critical step in the analytical workflow. This is particularly important in discovery experiments to ensure only confident identifications are accumulated for downstream analysis and biomarker consideration. However, the inherent nature of discovery proteomics experiments leads to scenarios where the search space will inflate substantially due to the increased number of potential proteins that are being queried in each sample. In these cases, issues will begin to arise when the machine learning algorithms that are trained on an experiment specific basis cannot accurately distinguish between correct and incorrect identifications and will struggle to accurately control the false discovery rate. Here, we propose an alternative validation algorithm trained on a curated external data set of 2.8 million extracted peakgroups that leverages advanced machine learning techniques to create a generalizable peakgroup scoring (GPS) method for data independent acquisition (DIA) mass spectrometry. By breaking the reliance on the experimental data at hand and instead training on a curated external dataset, GPS can confidently control the false discovery rate while increasing the number of identifications and providing more accurate quantification in different search space scenarios. To first test the performance of GPS in a standard experimental environment and to provide a benchmark against other methods, a novel spike-in data set with known varying concentrations was analyzed. When compared to existing methods GPS increased the nunmber of identifications by 5-18% and was able to provide more accurate quantification by increasing the number of ratio validated identifications by 24-74%. To evaluate GPS in a larger search space, a novel data set of 141 blood plasma samples from patients developing acute kidney injury after sepsis was searched with a human tissue spectral library (10000+ proteins). Using GPS, we were able to provide a 207-377% increase in the number of candidate differentially abundant proteins compared to the existing methods while maintaining competitive numbers of global identifications. Finally, using an optimized human tissue library and workflow we were able to identify 1205 proteins from the 141 plasma samples and increase the number of candidate differentially abundant proteins by 70.87%. With the addition of machine learning aided differential expression, we were able to identify potential new biomarkers for stratifying subphenotypes of acute kidney injury in sepsis. These findings suggest that by using a generalized model such as GPS in tandem with a massive scale spectral library it is possible to expand the boundaries of discovery experiments in DIA proteomics. GPS is open source and freely available on github at (https://github.com/InfectionMedicineProteomics/gscore).
Chen, H.-C.; Newton, C. J.; Zheng, Y.; Kong, F.; Yao, Y.; Yang, L.; Kvitko, B. H.
Show abstract
The apoplast is a critical interface in plant-pathogen interactions particularly in the context of pattern-triggered immunity (PTI), which is initiated by recognition of microbe-associated molecular patterns (PAMPs). Our study characterizes the proteomic profile of the Arabidopsis apoplast during PTI induced by flg22, a 22 amino acid bacterial flagellin epitope, to elucidate the output of PTI. Apoplastic washing fluid (AWF) was extracted with minimal cytoplasmic contamination for LC-MS/MS analysis. We observed consistent identification of PTI enriched and depleted peptides across replicates with limited correlation between total protein abundance and transcript abundance. We observed topological bias in peptide recovery of receptor-like kinases with peptides predominantly recovered from their ectodomains. Notably, tetraspanin 8, an exosome marker, was enriched in PTI samples. We additionally confirmed increased concentrations of exosomes during PTI. This study enhances our understanding of the proteomic changes in the apoplast during plant immune responses and lays the groundwork for future investigations into the molecular mechanisms of plant defense under recognition of pathogen molecular patterns.
Sun, R.; Zhu, Y. J.; Savad, A.; Ge, W.; Luna, A.; Liang, S.; Segura, L. T.; Rajapakse, V. N.; Yu, C.; Zhang, H.; Fang, J.; Wu, F.; Xie, H.; Saez-Rodriguez, J.; Ying, H.; Reinhold, W. C.; Sander, C.; Pommier, Y.; Neel, B. G.; Guo, T.; Aebersold, R.
Show abstract
Treatment and relevant targets for breast cancer (BC) remain limited, especially for triple-negative BC (TNBC). We quantified the proteomes of 76 human BC cell lines using data independent acquisition (DIA) based proteomics, identifying 6091 proteins. We then established a 24-protein panel distinguishing TNBC from other BC types. Integrating prior multi-omics datasets with the present proteomic results to predict the sensitivity of 90 drugs, we found that proteomics data improved drug sensitivity predictions. The sensitivity of the 90 drugs was mainly associated with cell cytoskeleton, signal transduction and mitochondrial function. We next profiled the proteome changes of nine cell lines (five TNBC cell lines, four non-TNBC cell lines) perturbated by EGFR/AKT/mTOR inhibitors. In the TNBC cell lines, metabolism pathways were dysregulated after EGFR/mTOR inhibitors treatment, while RNA modification and cell cycle pathways were dysregulated after AKT inhibitor treatment. Our study presents a systematic multi-omics and in-depth analysis of the proteome of BC cells. This work aims to aid in prioritization of potential therapeutic targets for TNBC as well as to provide insight into adaptive drug resistance in TNBC.
Chang, A. C.-C.; Schlegel, B. T.; Carleton, N.; McAulife, P. F.; Oesterreich, S.; Schwartz, R.; Lee, A. V.
Show abstract
BackgroundDysplastic tissue architecture in estrogen receptor-positive (ER+) breast cancer across therapy-naive and therapy-exposed cancer tissues presents unique challenges in the analysis of spatial transcriptomics. Many tools for deconvolution are developed on well-structured tissue architectures such as the 10x Genomics mouse hippocampus dataset. Spatial transcriptomics analysis could offer valuable insights into treatment response, but faces limitations in cellular resolution. MethodsTo address this problem, we developed CITEgeist, a computational tool for spatial transcriptomic deconvolution using integrated proteomics data from the same slide. Visium Antibody Capture technology was applied alongside our novel algorithm to analyze the tumor microenvironment. We demonstrate the reliability of our method using pre- and post-treatment samples from six breast cancer cases. ResultsOur approach revealed previously undetectable cellular interactions within the tumor microenvironment. By taking an interoperable approach to software development and grounding our algorithm in interpretable variables, we demonstrate how CITEgeist deconvolution is not only accurate but robust enough to be directly used as input in external analytical tools developed by other research teams. We then applied this approach to a set of specimens from a prospective trial our group ran and further validated the findings in a series of in vitro experiments as a demonstrated use case of the utility, necessity, and flexibility of CITEgeist; and the potential of our method to rapidly translate novel clinical samples to new biological insights. ConclusionsCITEgeist addresses a critical technical gap in spatial multi-omics analysis through an integrated, multi-disciplinary approach. This work demonstrates the value of combining clinical, translational, and computational expertise to identify novel mechanisms of treatment resistance, potentially transforming therapeutic strategies for resistant disease.
Lee, N.; Yoo, H.; Han, D.; Yang, H.
Show abstract
AbstractData-independent acquisition (DIA) has gained much attention in mass spectrometry (MS)-based proteomics for its improved reproducibility and unbiased data acquisition. In DIA-MS, the spectral library is crucial in peptide identification. However, this method is limited to peptides previously identified via data-dependent acquisition (DDA) MS experiments. This study proposes a deep learning approach for generating spectral libraries, even for previously unseen peptides. While most deep learning-based methods rely on one-hot encoding representation for peptides, the proposed method incorporates physicochemical features, including atomic composition, hydrophobicity, flexibility, fractional surface probability, and aromaticity. We introduce sparsity regu-larized neural network layers to facilitate the selection and combination of important high-dimensional physicochemical features and improve prediction performance. Fur-thermore, we suggest a transfer learning strategy for training the proposed deep neural networks having multiple heterogeneous input channels. Numerical experiments using benchmark DDA-MS data demonstrated that the proposed deep learning model out-performed existing benchmark models, such as Prosit and DeepDIA, particularly in predicting retention times. And it was demonstrated that the proposed models with sparsity regularization identified more peptides from HeLa cell DIA data compared to the other deep learning models.