Influence of molecular representation and charge on protein-ligand structural predictions by popular co-folding methods

Bugrova, A.; Orekhov, P.; Gushchin, I.

2026-02-18 bioinformatics

10.64898/2026.02.18.706547 bioRxiv

Show abstract

Recently developed deep learning-based tools can effectively generate structural models of complexes of proteins and non-proteinaceous compounds. While some of their predictive capabilities are truly exciting, others remain to be thoroughly tested. Here, we probe whether the ligand input format (Chemical Component Dictionary, CCD, or Simplified Molecular Input Line Entry System, SMILES) and charge (which depends on protonation) will affect the results of the predictions by four popular algorithms: AlphaFold 3, Boltz-2, Chai-1, and Protenix-v1. We chose methylamine and acetic acid as two of the simplest titratable chemicals that are omnipresent in proteins as amino and carboxy moieties, and are consequently ubiquitous in the Protein Data Bank models that are most commonly used for training. Unexpectedly, we found that for both molecules, in many cases the input format affected the prediction results, and did it much stronger compared to protonation, whereas changes in the formally specified charge of the molecules did not lead to changes in binding expected from experiments. We conclude that (i) ensuring identical results irrespective of input formats and (ii) inclusion of protonation-related steps into training and prediction pipelines are the two available paths for improvement of protein-ligand structure prediction algorithms.

Influence of molecular representation and charge on protein-ligand structural predictions by popular co-folding methods

Matching journals