PocketBagger: Generalizable pocket druggability prediction via positive-unlabeled learning
Gingrich, P. W.; Biswas, A.; Mica, I. L.; Brammer, K. M.; Shu, Z.; Maxwell, D. S.; Russell, K. P.; Al-Lazikani, B.
Show abstract
Abstract SummaryReliable structure-based prediction of small-molecule druggability is hindered by a fundamental labeling problem. Experimentally confirmed liganded sites (positives) are observable, but credible "undruggable" pockets (negatives) are almost impossible to define. Standard supervised machine learning consequently relies on arbitrary definitions of undruggable, leading to bias and false negatives. Here we introduce PocketBagger, a positive-unlabeled (PU) learning framework for pocket druggability prediction trained exclusively on experimentally determined Protein Data Bank1 (PDB) structures. PocketBagger uses PU bagging to learn key features associated with reliable druggable pockets and considers all remaining pockets in the structurally characterized proteome as unlabeled. We demonstrate the capability of PocketBagger through the training of a simple Random Forest classifier and demonstrate its power in recall (0.804), even when challenged with increasingly difficult generalizability assessments and entire protein-family hold outs. We benchmark and demonstrate the added value of PU learning by comparing PocketBagger to a leading deep-learning predictor. However, PocketBagger is intended to be used as a framework for any model architecture. Along with the code, the data generated by PocketBagger are deployed in canSAR.ai, providing scalable, generalizable pocket druggability predictions to the drug discovery community.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.