Back

Identification and Analysis of Novel RNA Editing Sites in Neurodegenerative Diseases Using Machine Learning Approaches.

Jabin, S.; Natarajan, E.

2026-04-13 neuroscience
10.64898/2026.04.09.716726 bioRxiv
Show abstract

BackgroundRNA editing is a post-transcriptional modification that alters the sequence of an RNA transcript. Two types of RNA editing were found in mammals, involving the enzymatic deamination of either adenosine to inosine (A-to-I) or cytidine to uridine (C-to-U) nucleotides in RNA. A-to-I, which is the most common form of RNA editing, is mediated by the ADAR (adenosine deaminases acting on RNA) family of enzymes, ADAR1, ADAR2, and ADAR3. The editing event alters the hydrogen bond pairing of nucleobases, and the editing site will be recorded as guanosine rather than the original adenosine. Indeed, RNA editing deregulation has been linked to several nervous and neurodegenerative diseases. In this project work is done on Alzheimers disease (AD) and the samples are from anterior cingulate cortex of human brain tissue. AD is the main dementia in the world and a neurodegenerative condition prevalent in the elderly. MethodologyA total of 20 raw RNA-sequencing data samples containing 10 controls and 10 Alzheimers disease (AD) cases were collected from NCBI using SRA Toolkit. Quality assessment was performed using FastQC and processed using Trimmomatic. Alignment was done using STAR RNA-seq aligner. RNA editing detection was performed using REDItools, detected sites were subsequently annotated against the REDIportal database. The resulting control-specific and disease-specific novel editing sites were merged into a single dataset containing exclusively novel, group-specific A-to-I editing events. This merged dataset was subsequently used for downstream feature extraction and machine learning analysis. Probability-based filtering was done to extract high-confidence disease associated sites and their gene list was used for computational level biological validation, pathway and functional enrichment analysis as well as overlap with known AD loci. ResultsRandom Forest showed the highest accuracy score (0.804) and ROC-AUC score (0.854). Most important features that differentiated control and diseased novel sites in random forest were coverage ([~]0.35), editing level ([~]0.33) and GC content ([~]0.15). The AEI mean values is higher in both male and female diseased cases ([~]0.48-0.50) but less in male and female control cases ([~]0.14-0.21). The mean values of ADAR1_CPM higher in control cases (123.65-143.30) and is less in diseased cases (88.35-97.93), ADAR2_CPM is almost equal in all cases ([~]3.7-4.7) and ADAR3_CPM is very less in all the cases ([~]0-0.02). Most candidate editing site were present in exon ([~]62-67 %) CDS regions ([~]17-21%) and relatively smaller fraction of gene ([~]15-16 %). Editing alterations preferentially affect molecular systems governing synaptic structure, neurotransmission, and central nervous system integrity. In the main set -of the 2576 high-confidence genes identified, 33 overlapped with AD GWAS loci. In the core set -of the 1367 high-confidence genes identified, 11 overlapped with AD GWAS loci. ConclusionFeature like coverage, editing level and GC content contributed most. Alu sites are negligible as compared to non-alu sites but the AEI mean values are higher in diseased cases than in control cases. The mean values of ADAR1_CPM are higher than ADAR2_CPM and ADAR3_CPM.Sex does not play a major factor. High-confidence disease-associated RNA editing sites are strongly biased toward transcript-centric regions, particularly exons, with a notable subset affecting coding sequences. Importantly, enrichment of neurodegeneration-associated pathways and cognition-related human phenotypes further supports the disease relevance of these gene networks. RNA editing events in Alzheimers cortex may represent a regulatory mechanism largely independent of inherited genetic susceptibility loci.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 12%
15.0%
2
Journal of Alzheimer’s Disease
39 papers in training set
Top 0.1%
14.6%
3
Scientific Reports
3102 papers in training set
Top 17%
6.4%
4
Alzheimer's & Dementia
143 papers in training set
Top 1%
4.4%
5
Frontiers in Aging Neuroscience
67 papers in training set
Top 0.9%
3.9%
6
Journal of Alzheimer's Disease
43 papers in training set
Top 0.5%
3.7%
7
Alzheimer's & Dementia: Translational Research & Clinical Interventions
16 papers in training set
Top 0.3%
1.9%
8
Neurobiology of Disease
134 papers in training set
Top 2%
1.9%
50% of probability mass above
9
BMC Medical Genomics
36 papers in training set
Top 0.5%
1.7%
10
International Journal of Molecular Sciences
453 papers in training set
Top 9%
1.4%
11
Molecular Neurobiology
50 papers in training set
Top 0.6%
1.2%
12
Frontiers in Psychiatry
83 papers in training set
Top 2%
1.2%
13
Neuroscience
88 papers in training set
Top 2%
1.2%
14
Molecular Biology Reports
19 papers in training set
Top 0.3%
1.1%
15
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
16
Journal of Proteome Research
215 papers in training set
Top 2%
0.9%
17
Healthcare
16 papers in training set
Top 1%
0.9%
18
Biology of Sex Differences
29 papers in training set
Top 0.5%
0.9%
19
PLOS Computational Biology
1633 papers in training set
Top 22%
0.9%
20
Cells
232 papers in training set
Top 5%
0.9%
21
Neurobiology of Aging
95 papers in training set
Top 2%
0.8%
22
Acta Neuropathologica Communications
81 papers in training set
Top 1%
0.8%
23
Frontiers in Genetics
197 papers in training set
Top 9%
0.8%
24
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
25
Genomics
60 papers in training set
Top 2%
0.8%
26
eLife
5422 papers in training set
Top 57%
0.8%
27
Molecular Brain
26 papers in training set
Top 0.3%
0.8%
28
Biology
43 papers in training set
Top 3%
0.7%
29
eneuro
389 papers in training set
Top 9%
0.7%
30
European Journal of Neuroscience
168 papers in training set
Top 2%
0.7%