Large-scale analysis of ligand binding mode similarities in the PDB using interaction fingerprints
Kunnakkattu, I. R.; Choudhary, P.; Midlik, A.; Fleming, J. R.; Balasubramaniyan, B.; Sasidharan Nair, S.; Velankar, S.
Show abstract
Three-dimensional structures of protein-ligand complexes are essential for insights into the molecular principles that govern ligand recognition and binding. With more than 180,000 ligand-bound entries in the Protein Data Bank (PDB), representing over two million individual complexes, the volume of available structural data offers unprecedented opportunities for large-scale analysis of interaction patterns. Analysis of interaction patterns across the PDB archive can help discover similarities and differences in the binding modes of ligands, assisting in drug discovery. However, large-scale analysis of up-to-date information remains a significant challenge due to the rapid growth of data. Here, we introduce the Extended Connectivity Interaction Fingerprint (ECIFP), an interaction-based fingerprint that simplifies 3D protein-ligand contact information into a fingerprint, while retaining key molecular and chemical features of the interacting fragments. The simpler fingerprint representation of the interaction data makes comparison of millions of protein-ligand complexes tractable. Benchmarking shows that ECIFP outperforms ligand-only Extended Connectivity Fingerprints in identifying similar binding sites across identical protein sequences occupied by chemically diverse ligands. Our analysis showed that similarities calculated using ECIFP can be used to compare macromolecular complexes with similar or different ligands. In this study, we demonstrate two large-scale applications of ECIFP: (1) identification of distinct binding modes for over 9,000 ligands across the entire PDB, and (2) detection of binding-mode similarities among structurally diverse ligands within the same binding site across 48,870 binding sites from over 21,000 proteins.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.