Bento: Benchmarking Classical and AI Docking on Drug Design-Relevant Data
Pak, M. A.; Frolova, D.; Nikolenko, S. A.; Daulbaev, T.; Ryabchenko, D.; Litvin, A.; Gurevich, P.; Garifullin, K.; Shapeev, A.; Oseledets, I.; Ivankov, D. N.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWRecent advances in artificial intelligence have introduced deep learning and co-folding approaches for predicting protein-ligand complexes, raising the question of their applicability and how they compare with classical docking methods. In this work, we present a thorough benchmarking study of eleven tools for protein-ligand interaction prediction, spanning classical molecular docking methods, deep learning-based models, and co-folding algorithms. While most related benchmarking efforts primarily assess the generalization capacity, we extend the analysis to also evaluate the performance on drug design-relevant data and across different classes of protein-ligand complexes. Here, we introduce BO_SCPLOWENTOC_SCPLOW, a comprehensive benchmark that evaluates 11 tools for protein-ligand interaction prediction - both established and recently developed - across four test datasets and multiple derived subsets in a pocket-aware setup. We show that 1) careful dataset curation is essential - filtering by pocket structural similarity and controlling ligand complexity exposes generalization failures that are obscured in conventional benchmarks; 2) classical and deep learning-based docking tools perform similarly well on drug-like ligands, making them comparably useful for virtual screening, with physics-based methods offering a clear advantage in speed; 3) co-folding tools outperform other approaches on structurally complex ligands, whereas most methods achieve similar accuracy on regular small molecules; and 4) all methods struggle to generalize to unseen pockets, with deep learning models being the most prone to overfitting. Overall, our results show that while current docking and DL-based approaches are reliable for many drug-design-relevant scenarios, genuine pocket-level generalization remains an open challenge. BO_SCPLOWENTOC_SCPLOW provides a rigorous and transparent framework for diagnosing these limitations and guiding the development of more robust protein-ligand prediction models. The data and code of Bento are available at https://github.com/LigandPro/Bento.
Matching journals
The top 2 journals account for 50% of the predicted probability mass.