AI-m6ARS: Machine learning-driven m6A RNA methylation site discovery with integrated sequence, conservation, and geographical descriptors
Uthayopas, K.; de Sa, A. G. C.; Ascher, D. B.
Show abstract
N6-Methyladenosine (m6A) is a predominant type of human RNA methylation, regulating diverse biochemical processes and being associated with the development of several diseases. Despite its significance, an extensive experimental examination across diverse cellular and transcriptome contexts is still lacking due to time and cost constraints. Computational models have been proposed to prioritise potential m6A methylation sites, although having limited predictive performance due to inadequate characterisation and modelling of m6A sites. This work presents AI-m6ARS, a novel model that utilises integrated sequence, conservation, and geographical descriptive features to predict human m6A methylation sites. The model was trained using the Light Gradient Boosting Machine (LightGBM) algorithm, which was coupled with comprehensive feature selection to improve the data quality. AI-m6RS demonstrates strong predictive capabilities, achieving an impressive area under the receiver operating characteristic curve of 0.87 on cross-validation. Consistent results on unseen transcripts in a blind test highlight the AI-m6ARS generalisability. AI-m6ARS also demonstrates comparable performance to state-of-the-art models, but offers two significant benefits: the model interpretability and the availability of a user-friendly web server. The AI-m6ARS web server offers valuable insights into the distribution of m6A sites within the human genome, thereby facilitating progress in medical applications. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=77 SRC="FIGDIR/small/599439v1_ufig1.gif" ALT="Figure 1"> View larger version (13K): org.highwire.dtl.DTLVardef@12d7502org.highwire.dtl.DTLVardef@15cf6b5org.highwire.dtl.DTLVardef@490699org.highwire.dtl.DTLVardef@5046c1_HPS_FORMAT_FIGEXP M_FIG C_FIG
Matching journals
The top 4 journals account for 50% of the predicted probability mass.