Back

Codon arrangement modulates MHC-I peptides presentation

Daouda, T.; Dumont-Lagacé, M.; Feghaly, A.; Benslimane, Y.; Panes, R.; Courcelles, M.; Benhammadi, M.; Harrington, L.; Thibault, P.; Major, F.; Bengio, Y.; Gagnon, E.; Lemieux, S.; Perreault, C.

2020-06-04 bioinformatics
10.1101/2020.06.03.078824 bioRxiv
Show abstract

MHC-I associated peptides (MAPs) play a central role in the elimination of virus-infected and neoplastic cells by CD8 T cells. However, accurately predicting the MAP repertoire remains difficult, because only a fraction of the transcriptome generates MAPs. In this study, we investigated whether codon arrangement (usage and placement) regulates MAP biogenesis. We developed an artificial neural network called Codon Arrangement MAP Predictor (CAMAP), predicting MAP presentation solely from mRNA sequences flanking the MAP-coding codons (MCCs), while excluding the MCC per se. CAMAP predictions were significantly more accurate when using original codon sequences than shuffled codon sequences which reflect amino acid usage. Furthermore, predictions were independent of mRNA expression and MAP binding affinity to MHC-I molecules and applied to several cell types and species. Combining MAP ligand scores, transcript expression level and CAMAP scores was particularly useful to increaser MAP prediction accuracy. Using an in vitro assay, we showed that varying the synonymous codons in the regions flanking the MCCs (without changing the amino acid sequence) resulted in significant modulation of MAP presentation at the cell surface. Taken together, our results demonstrate the role of codon arrangement in the regulation of MAP presentation and support integration of both translational and post-translational events in predictive algorithms to ameliorate modeling of the immunopeptidome. Author summaryMHC-I associated peptides (MAPs) are small fragments of intracellular proteins presented at the surface of cells and used by the immune system to detect and eliminate cancerous or virus-infected cells. While it is theoretically possible to predict which portions of the intracellular proteins will be naturally processed by the cells to ultimately reach the surface, current methodologies have prohibitively high false discovery rates. Here we introduce an artificial neural network called Codon Arrangement MAP Predictor (CAMAP) which integrates information from mRNA-to-protein translation to other factors regulating MAP biogenesis (e.g. MAP ligand score and transcript expression levels) to improve MAP prediction accuracy. While most MAP predictive approaches focus on MAP sequences per se, CAMAPs novelty is to analyze the MAP-flanking mRNA sequences, thereby providing completely independent information for MAP prediction. We show on several datasets that the integration of CAMAP scores with other known factors involved in MAP presentation (i.e. MAP ligand score and mRNA expression) significantly improves MAP prediction accuracy, and further validate CAMAP learned features using an in-vitro assay. These findings may have major implications for the design of vaccines against cancers and viruses, and in times of pandemics could accelerate the identification of relevant MAPs of viral origins.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 0.7%
22.5%
2
ImmunoInformatics
11 papers in training set
Top 0.1%
14.7%
3
Frontiers in Immunology
586 papers in training set
Top 0.4%
12.5%
4
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.6%
6.4%
50% of probability mass above
5
Bioinformatics
1061 papers in training set
Top 4%
4.9%
6
Frontiers in Bioinformatics
45 papers in training set
Top 0.1%
3.6%
7
Bioinformatics Advances
184 papers in training set
Top 2%
3.3%
8
Scientific Reports
3102 papers in training set
Top 43%
2.9%
9
iScience
1063 papers in training set
Top 7%
2.7%
10
Frontiers in Genetics
197 papers in training set
Top 3%
2.4%
11
PeerJ
261 papers in training set
Top 6%
1.8%
12
BMC Bioinformatics
383 papers in training set
Top 4%
1.7%
13
PLOS ONE
4510 papers in training set
Top 55%
1.7%
14
GigaScience
172 papers in training set
Top 1%
1.7%
15
Frontiers in Physiology
93 papers in training set
Top 4%
0.9%
16
Vaccines
196 papers in training set
Top 2%
0.9%
17
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.9%
18
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%
19
Immunology
29 papers in training set
Top 1%
0.7%
20
The Journal of Immunology
146 papers in training set
Top 2%
0.7%
21
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%
22
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.6%