Back

Evolutionary conditioning enables guided generation of functionally diverse enhancers

Duncan, A. G.; Consens, M. E.; Crawford, L.; Mitchell, J. A.; Moses, A. M.; Yang, K. K.; Lu, A. X.

2026-04-15 bioengineering
10.64898/2026.04.13.718170 bioRxiv
Show abstract

Deep learning has been instrumental in our understanding of how enhancers encode regulatory information in their DNA sequence and has demonstrated preliminary success with enhancer design. However, the prevailing approach for enhancer design, cell type label conditioning, depends on labeled data from massively parallel reporter assays, which only exists for a handful of cell types. We propose EnhancAR, an autoregressive model trained on sets of unaligned homologous enhancer sequences to learn the function of the enhancer conserved over evolution and generate sequences that resemble real homologs. By training EnhancAR on 1.7 million human enhancer homolog sets spanning 1,888 cell types, EnhancAR generates enhancers for a variety of contexts without being conditioned on a cell type label. We computationally validate that when conditioned on a set of enhancer homologs, EnhancAR generates novel and diverse sequences that preserve the functional properties of the homologs. By prompting EnhancAR with homologs for existing cell type specific enhancers, we design enhancers with similar predicted cell type specificity. We further demonstrate that when trained on length sorted homologs, EnhancAR can design enhancers shorter than the conditioning homologs that preserve the predicted activity. In summary, we find that leveraging evolutionary information in enhancer homologs enables a more flexible and general paradigm for designing enhancers with specific functions.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.1%
40.1%
2
Genome Biology
555 papers in training set
Top 0.3%
10.3%
50% of probability mass above
3
Nature Communications
4913 papers in training set
Top 26%
6.9%
4
Neuron
282 papers in training set
Top 3%
4.0%
5
Science
429 papers in training set
Top 8%
4.0%
6
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 19%
3.7%
7
Nucleic Acids Research
1128 papers in training set
Top 7%
2.7%
8
Nature Machine Intelligence
61 papers in training set
Top 1%
2.7%
9
Nature Methods
336 papers in training set
Top 4%
2.4%
10
Cancer Research
116 papers in training set
Top 1%
2.1%
11
Nature Genetics
240 papers in training set
Top 4%
1.9%
12
Cell Genomics
162 papers in training set
Top 3%
1.8%
13
Nature
575 papers in training set
Top 11%
1.7%
14
Nature Biotechnology
147 papers in training set
Top 6%
1.2%
15
Cell Reports
1338 papers in training set
Top 28%
1.2%
16
Nature Neuroscience
216 papers in training set
Top 5%
1.2%
17
PLOS Computational Biology
1633 papers in training set
Top 19%
1.2%
18
Advanced Science
249 papers in training set
Top 16%
0.9%
19
Science Advances
1098 papers in training set
Top 26%
0.9%
20
Genome Research
409 papers in training set
Top 4%
0.8%
21
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.8%
22
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 7%
0.5%
23
Nature Medicine
117 papers in training set
Top 6%
0.5%
24
PLOS ONE
4510 papers in training set
Top 73%
0.5%