TwinCell: Large Causal Cell Model for Reliable and Interpretable Therapeutic Target Prioritisation
Morlot, J.-B.; Dias, T.; Hatem, E.; Abraham, Y.
Show abstract
Drug discovery is impeded by the difficulty of translating targets from preclinical models to patients. In this work, we present TwinCell, a Large Causal Cell Model for target identification that, trained on in vitro cancer cell line perturbation data, generalises to patient-derived cell types while providing biologically meaningful interpretations of its predictions. Rather than predicting perturbation outcomes, TwinCell identifies the upstream regulators most likely to drive the transition between two cell states, such as diseased and healthy, by decomposing target probability over signalling paths through a multiomics interactome conditioned on single-cell foundation model embeddings. To validate this approach, we introduce TwinBench, a benchmarking framework that evaluates virtual cell models using recommendation system metrics while correcting for mode collapse through empirical p-value estimation. On both in vitro zero-shot scenarios and in clinico validation across five therapeutic areas, TwinCell outperforms not only state-of-the-art virtual cell models but also linear baselines and network-based methods, classically used to perform target identification. When applied to patient data, TwinCell recovers clinically approved drug targets and reconstructs known disease mechanisms, such as the type I interferon signalling cascade in Systemic Lupus Erythematosus, without any disease-specific supervision. These results demonstrate that constraining learned perturbation patterns to a biological interactome enables cross-tissue, cross-disease target identification with mechanistic interpretability, bridging the gap between high-throughput in vitro experiments and clinical insights.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.