Back

Characterizing Highly Conserved Fragments in 3'UTRs via Computational and Transfer Learning Approaches

Ho, E. S.; Baeck-Hubloux, A.; Dinh, N.; Severino, A.; Troy, C.

2026-01-20 genomics
10.64898/2026.01.19.700376 bioRxiv
Show abstract

3 untranslated regions (3 UTRs) serve as regulatory platforms that modulate translation, mRNA localization, and stability through the binding of regulators, such as RNA-binding proteins (RBPs) and miRNAs, in a sequence-specific manner. These vital binding sites are often identified through orthologous regions among species. A separate but related discovery is the ultraconserved elements (UCEs) detected in human, rat, and mouse genomes two decades ago. However, our knowledge about their functions is limited. Perplexingly, alterations in UCEs in mouse embryos can still produce viable progeny with no observable phenotypic differences. The majority of UCEs are non-coding, though [~]8% are located in the 3UTRs. Given the importance of 3UTRs in gene regulation, we use a computational approach to identify highly conserved fragments (CFs) in 3UTRs across diverse mammals, applying criteria appropriate for 3UTRs (250 bp and 290% identity). Results show that they are not composed of simple repeats or low-complexity regions common to mammalian genomes. Using a transformer-based foundational genomic model, CFs are characterized as A and T-rich and distinguishable from the 3UTR background. 36 human CFs from 25 genes are significantly depleted in variations in humans. They are enriched in neuronal tissues and play roles in neurodevelopment and RNA processing, mediated by RBPs and miRNAs. Our findings expand on existing studies that attribute UCEs primarily to enhancer function, suggesting a new path to explore the biological roles of UCEs in 3UTRs. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=139 SRC="FIGDIR/small/700376v1_ufig1.gif" ALT="Figure 1000"> View larger version (30K): org.highwire.dtl.DTLVardef@39727forg.highwire.dtl.DTLVardef@18c0374org.highwire.dtl.DTLVardef@136b784org.highwire.dtl.DTLVardef@14a5146_HPS_FORMAT_FIGEXP M_FIG C_FIG Created in BioRender. Ho, E. (2026) https://BioRender.com/dcyrx5f

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.1%
18.7%
2
Scientific Reports
3102 papers in training set
Top 8%
9.1%
3
Frontiers in Genetics
197 papers in training set
Top 0.5%
8.4%
4
Bioinformatics Advances
184 papers in training set
Top 1%
4.0%
5
Bioinformatics
1061 papers in training set
Top 5%
4.0%
6
BMC Bioinformatics
383 papers in training set
Top 3%
3.6%
7
Nucleic Acids Research
1128 papers in training set
Top 7%
3.2%
50% of probability mass above
8
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.6%
9
Genes
126 papers in training set
Top 0.6%
2.1%
10
Epigenetics
43 papers in training set
Top 0.3%
2.1%
11
GigaScience
172 papers in training set
Top 1%
1.7%
12
iScience
1063 papers in training set
Top 15%
1.7%
13
PLOS ONE
4510 papers in training set
Top 54%
1.7%
14
G3 Genes|Genomes|Genetics
351 papers in training set
Top 1%
1.7%
15
Nature Communications
4913 papers in training set
Top 51%
1.7%
16
Computational Biology and Chemistry
23 papers in training set
Top 0.1%
1.7%
17
International Journal of Molecular Sciences
453 papers in training set
Top 9%
1.5%
18
BMC Genomics
328 papers in training set
Top 3%
1.3%
19
BMC Biology
248 papers in training set
Top 2%
1.3%
20
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.3%
21
PLOS Computational Biology
1633 papers in training set
Top 19%
1.3%
22
Life Science Alliance
263 papers in training set
Top 1%
0.9%
23
Genetic Epidemiology
46 papers in training set
Top 0.7%
0.9%
24
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 6%
0.8%
25
Genomics
60 papers in training set
Top 2%
0.8%
26
Database
51 papers in training set
Top 1.0%
0.7%
27
Heliyon
146 papers in training set
Top 7%
0.7%
28
Gene
41 papers in training set
Top 2%
0.7%
29
Journal of Genetics and Genomics
36 papers in training set
Top 2%
0.7%
30
Scientific Data
174 papers in training set
Top 3%
0.7%