Carafe2 enables high quality in silico spectral library generation for timsTOF data-independent acquisition proteomics
Wen, B.; Paez, J. S.; Hsu, C.; Canzani, D.; Chang, A. T.; Shulman, N.; MacLean, B. X.; Berg, M. D.; Villen, J.; Fondrie, W.; Pino, L.; MacCoss, M. J.; Noble, W. S.
Show abstract
Data-independent acquisition (DIA) proteomics enables reproducible and systematic peptide detection and quantification, and trapped ion mobility spectrometry (TIMS) on the timsTOF platform further improves DIA by synchronizing ion mobility separation with quadrupole precursor sampling. Analyzing the highly multiplexed spectra generated by DIA typically relies on spectral libraries, and fully leveraging the additional ion mobility dimension requires these libraries to include accurate retention time, fragment ion intensity, and ion mobility annotations. Existing in silico spectral library generation tools either lack ion mobility support entirely or rely on models trained on data-dependent acquisition (DDA) data, that can introduce a mismatch that may not capture unique experiment-specific biases when applied to each respective timsTOF dataset. Carafe is a software tool that uses deep learning models to generate high-quality, experiment-specific in silico libraries by training directly on DIA data. In this study, we extend Carafe to generate libraries for timsTOF DIA data, which involves fine-tuning retention time (RT), fragment ion intensity, and ion mobility prediction models using timsTOF DIA data. Carafe2 operates directly on native timsTOF raw data (Bruker .d directories) without the need for data conversion. We demonstrate the performance of Carafe2 across a wide range of DIA applications, including global proteome, phosphoproteome, and plasma proteome datasets. Comparing Carafe2 fine-tuned RT, fragment ion intensity, and ion mobility prediction models with pretrained DDA models, we find that Carafe2 models outperform pretrained models on a variety of DIA datasets. We then demonstrate the utility of in silico libraries generated by Carafe2 for peptide detection on several different types of timsTOF DIA datasets by comparing with the libraries generated with DDA-trained AlphaPeptDeep models, DIA-NN built-in models, and empirical spectral libraries generated from DDA experiments.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.