Back

Sequence Design and Phylogenetic Inference with Generative Flow Networks

Huang, Q.; Mourra-Diaz, C. M.; Wen, X.; Payette, D.

2026-04-09 synthetic biology
10.64898/2026.04.08.717239 bioRxiv
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWPhylogenetic inference remains computationally challenging due to the exponentially growing tree topology search space, and current methods rely heavily on multiple sequence alignments (MSAs) which are expensive and error-prone. We propose AncestorGFN, a proof-of-concept approach leveraging Generative Flow Networks (GFlowNets) for simultaneous sequence generation and phylogenetic exploration without requiring explicit MSAs. Our method learns to generate sequences matching a target distribution while the flow trajectories implicitly encode structural relationships among sequences. We demonstrate that greedy traceback on maximum-flow trajectories recovers shared intermediate states suggestive of common ancestry, and evaluate on the let-7 microRNA family where the learned flow structure qualitatively captures phylogenetic branching patterns. Furthermore, beam search at inference time discovers novel sequences clustering near known targets, suggesting applications in de novo sequence design. This work establishes an initial foundation for alignment-free phylogenetic exploration using generative models.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Methods
336 papers in training set
Top 0.5%
14.6%
2
Nature Communications
4913 papers in training set
Top 16%
10.4%
3
Science
429 papers in training set
Top 3%
8.3%
4
Nature Computational Science
50 papers in training set
Top 0.1%
8.3%
5
Cell Systems
167 papers in training set
Top 2%
6.3%
6
Nucleic Acids Research
1128 papers in training set
Top 4%
4.8%
50% of probability mass above
7
Nature Biotechnology
147 papers in training set
Top 2%
4.8%
8
Nature
575 papers in training set
Top 6%
3.9%
9
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.2%
10
Nature Machine Intelligence
61 papers in training set
Top 1%
2.6%
11
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 26%
2.3%
12
Bioinformatics
1061 papers in training set
Top 7%
2.1%
13
Cell
370 papers in training set
Top 9%
2.1%
14
Communications Biology
886 papers in training set
Top 5%
2.1%
15
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
16
Neuron
282 papers in training set
Top 6%
1.7%
17
Advanced Science
249 papers in training set
Top 11%
1.6%
18
Genome Biology
555 papers in training set
Top 5%
1.6%
19
Genome Research
409 papers in training set
Top 3%
1.3%
20
iScience
1063 papers in training set
Top 22%
1.2%
21
eLife
5422 papers in training set
Top 54%
0.9%
22
PLOS ONE
4510 papers in training set
Top 64%
0.9%
23
Nature Plants
84 papers in training set
Top 2%
0.7%
24
Science Advances
1098 papers in training set
Top 30%
0.7%
25
ACS Synthetic Biology
256 papers in training set
Top 3%
0.7%
26
Cell Reports
1338 papers in training set
Top 35%
0.6%
27
Journal of The Royal Society Interface
189 papers in training set
Top 5%
0.6%
28
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 7%
0.6%
29
Scientific Reports
3102 papers in training set
Top 78%
0.6%
30
Frontiers in Molecular Biosciences
100 papers in training set
Top 6%
0.6%