Back

Using Deep Learning with Different Architectures to Recognize RNA:DNA Triplex Structures from Histone Modification Features

Tsenum, J. L.

2025-11-24 bioinformatics
10.1101/2025.09.16.676231 bioRxiv
Show abstract

Long non-coding RNAs (lncRNAs) can perform their regulatory roles by forming triple helices through RNA-DNA interaction. Although this has been verified by few in vivo and in vitro methods, in silico approaches that seek to predict the potentials of lncRNAs and DNA sites becoming a triplex forming structure is required. Triplexator have also predicted vast amounts of lncRNAs and DNA sites that has the potentials of becoming a triplex structure. There is also an emerging experimental-evidence that the presence of epigenetic marks at DNA sites and lncRNAs can facilitate the formation of RNA:DNA triplex structures. There is therefore, a huge demand for computati onal approaches such as deep learning that can make novel predictions about RNA:DNA triplex structure formation. In this study, we developed four (4) deep neural network models that can predict the potentials of lncRNAs and DNA sites to form triple helices genome-wide, by taking histone modification marks as features. Our data was first passed through the Triplexator to screen out lncRNAs and DNA sites with low potentials of forming triple helices. We used different deep learning architectures to build our models, including two-layer convolutional neural networks (CNN) and multilayer perceptron (MLP). Our DNA2_CNN model performed best at a mean AUC of 0.78 at 32 Kernel size and learning rate of 1e-3. Our deep neural network models revealed several novel lncRNAs and DNA sites, including HOTAIR, MEG3, PARTICLE, DACOR1, MIR100HG, FENDRR, ANRIL, TUG1, MALAT1, LINC00599, TINCR, NEAT1, roX2, DHFR, OTX2-AS1, Xist, SNHG16, ATXN8OS, BCYRN1, TERC, Khps1, that have the potential of forming triplex structures, thereby confirming previous experimental results and that of the Triplexator. The performance of our models also supports previous findings that histone modification marks can help in identifying lncRNAs and DNA regions that have the potentials of forming RNA:DNA triplex structures. In conclusion, we showed that different deep learning architectures can recognize lncRNAs and DNA that have the potentials of forming RNA:DNA triplex structures.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Frontiers in Genetics
197 papers in training set
Top 0.4%
9.1%
2
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.3%
8.4%
3
PLOS ONE
4510 papers in training set
Top 22%
8.4%
4
BMC Bioinformatics
383 papers in training set
Top 2%
6.4%
5
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.9%
6
Computational Biology and Chemistry
23 papers in training set
Top 0.1%
4.6%
7
Scientific Reports
3102 papers in training set
Top 31%
4.0%
8
Nucleic Acids Research
1128 papers in training set
Top 6%
3.6%
9
PLOS Computational Biology
1633 papers in training set
Top 10%
3.6%
50% of probability mass above
10
Genes
126 papers in training set
Top 0.3%
3.3%
11
Bioinformatics
1061 papers in training set
Top 6%
2.7%
12
Molecular Therapy Nucleic Acids
32 papers in training set
Top 0.3%
2.1%
13
Computers in Biology and Medicine
120 papers in training set
Top 2%
2.1%
14
Methods
29 papers in training set
Top 0.1%
1.9%
15
Bioinformatics Advances
184 papers in training set
Top 2%
1.9%
16
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.8%
17
ACS Omega
90 papers in training set
Top 2%
1.7%
18
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.2%
1.7%
19
Biosystems
18 papers in training set
Top 0.2%
1.7%
20
F1000Research
79 papers in training set
Top 2%
1.7%
21
Database
51 papers in training set
Top 0.4%
1.7%
22
BMC Genomics
328 papers in training set
Top 3%
1.3%
23
Genomics
60 papers in training set
Top 1%
1.3%
24
PeerJ
261 papers in training set
Top 11%
0.9%
25
International Journal of Molecular Sciences
453 papers in training set
Top 13%
0.9%
26
Biology Methods and Protocols
53 papers in training set
Top 2%
0.9%
27
RNA Biology
70 papers in training set
Top 0.4%
0.9%
28
Epigenetics
43 papers in training set
Top 0.9%
0.8%
29
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 6%
0.7%
30
RNA
169 papers in training set
Top 0.5%
0.7%