Back

ProtmRNA: Cross-Modal Knowledge Transfer from Proteins to Messenger RNA

Xu, G.; Wu, X.; Ma, J.

2026-05-19 bioinformatics
10.64898/2026.05.19.726141 bioRxiv
Show abstract

MotivationAccording to the central dogma of molecular biology, messenger RNA (mRNA) sequences are directly translated into amino acid sequences, positioning mRNA as the fundamental intermediary between genetic information and functional proteins. This natural correspondence suggests that mRNA sequence analysis could greatly benefit from the rich evolutionary and functional representations learned by large-scale protein language models. ResultsProtmRNA repurposes the pre-trained ESM-2 protein language model for mRNA sequence processing via cross-modal transfer learning. Evaluated on mRNA- and protein-related datasets, along with eight additional benchmarks compiled in this study, ProtmRNA achieves performance comparable or superior to state-of-the-art mRNA language models while using less than half the pre-training computational resources. This work establishes the potential of cross-modal transfer learning between biological sequences by demonstrating that protein-derived knowledge can be efficiently transferred to mRNA, offering a resource-efficient paradigm for advancing mRNA sequence understanding. Availability and ImplementationThe pre-trained ProtmRNA model and the eight CDS-region regression benchmarks curated in this study are publicly available at https://github.com/pesenteur/ProtmRNA.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.