Back

Joint Modeling of Transcriptomic and Morphological Phenotypes for Generative Molecular Design

Verma, S.; Wang, M.; Jayasundara, S.; Malusare, A. M.; Wang, L.; Grama, A.; Kazemian, M.; Lanman, N. A.

2026-02-04 bioinformatics
10.64898/2026.02.02.703193 bioRxiv
Show abstract

MotivationPhenotypic drug discovery generates rich multi-modal biological data from transcriptomic and morphological measurements, yet translating complex cellular responses into molecular design remains a computational bottleneck. Existing generative methods operate on single modalities and condition on post-treatment measurements without leveraging paired control-treatment dynamics to capture perturbation effects. ResultsWe present Pert2Mol, the first framework for multi-modal phenotype-to-structure generation that integrates transcriptomic and morphological features from paired control-treatment experiments. Pert2Mol employs bidirectional cross-attention between control and treatment states to capture perturbation dynamics, conditioning a rectified flow transformer that generates molecular structures along straight-line trajectories. We introduce Student-Teacher Self-Representation (SERE) learning to stabilize training in high-dimensional multi-modal spaces. On the GDP dataset, Pert2Mol achieves Frechet ChemNet Distance of 4.996 compared to 7.343 for diffusion baselines and 59.114 for transcriptomics-only methods, while maintaining perfect molecular validity and appropriate physicochemical property distributions. The model demonstrates 84.7% scaffold diversity and 12.4 times faster generation than diffusion approaches with deterministic sampling suitable for hypothesis-driven validation. AvailabilityCode and pretrained models will be available at https://github.com/wangmengbo/Pert2Mol.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1.0%
23.3%
2
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.5%
10.5%
3
Nature Communications
4913 papers in training set
Top 22%
8.5%
4
Journal of Cheminformatics
25 papers in training set
Top 0.1%
4.5%
5
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.3%
50% of probability mass above
6
Nature Machine Intelligence
61 papers in training set
Top 0.8%
3.7%
7
Patterns
70 papers in training set
Top 0.2%
3.7%
8
Cell Systems
167 papers in training set
Top 3%
3.7%
9
Bioinformatics Advances
184 papers in training set
Top 2%
2.4%
10
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.2%
11
BMC Bioinformatics
383 papers in training set
Top 4%
2.2%
12
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 29%
2.0%
13
Advanced Science
249 papers in training set
Top 9%
2.0%
14
Nature Methods
336 papers in training set
Top 4%
1.8%
15
PLOS Computational Biology
1633 papers in training set
Top 15%
1.8%
16
iScience
1063 papers in training set
Top 19%
1.4%
17
Chemical Science
71 papers in training set
Top 1%
1.1%
18
GigaScience
172 papers in training set
Top 2%
0.9%
19
Cell Reports Medicine
140 papers in training set
Top 7%
0.8%
20
Nucleic Acids Research
1128 papers in training set
Top 16%
0.8%
21
Communications Biology
886 papers in training set
Top 20%
0.8%
22
Scientific Reports
3102 papers in training set
Top 72%
0.8%
23
PLOS ONE
4510 papers in training set
Top 65%
0.8%
24
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.8%
25
ACS Synthetic Biology
256 papers in training set
Top 3%
0.8%
26
Cell Genomics
162 papers in training set
Top 6%
0.8%
27
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.2%
0.8%
28
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.8%
29
Nature Genetics
240 papers in training set
Top 8%
0.7%
30
Genome Medicine
154 papers in training set
Top 9%
0.7%