Back

A Generative Neuro-Symbolic AI for Protein Sequence Design

Defresne, M.; Dessaux, D.; Buchet, S.; Barthe, L.; Ammar-Khodja, L.; Azizi, B.; Durante, V.; Cioci, G.; de Givry, S.; Roussel, A.; Garcia-Alles, L.; Schiex, T.; Barbe, S.

2026-04-02 bioengineering
10.64898/2026.03.31.715526 bioRxiv
Show abstract

Deep learning has revolutionized computational protein design, enabling the generation of sequences that fold onto target backbones with unprecedented accuracy. However, state-of-the-art inverse folding tools largely rely on auto-regressive sampling. While powerful, this paradigm is increasingly recognized for its inability to "think ahead", a crucial capacity to reliably create the complex, long-range inter-residue dependencies essential for most biological functions. To overcome these fundamental limitations, we introduce EffieDes, a generative neuro-symbolic AI framework that synergizes the predictive capabilities of deep learning with the logical precision of automated reasoning. EffieDes leverages deep learning to encode the target backbones fitness landscape into Effie-- a fully decomposable probabilistic graphical model (Potts model). This landscape is then rigorously explored by an automated reasoning prover to identify sequences that simultaneously satisfy complex design constraints and optimize backbone fitness. We validated this neuro-symbolic approach through the design of orthogonal sequence pairs that adopt identical folds but exhibit selective self-assembly, as well as the design of a de novo selective nanobody with nanomolar affinity for an immune-evasive SARS-CoV-2 variant. EffieDes provides a robust architecture for precisely dissecting learned fitness landscapes, offering a new path toward proteins with highly optimized performances and sophisticated functional objectives.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.