In the context of high throughput virtual screening, automatic detection of potencialy druggable protein cavities is of vital interest. I will present different combinatorial geometric representations of a protein in 3D space and show how these ideas can be applied to detection of drug binding sites by some machine learning black magic. This is a joint work with T. Krick and M. Martí.
A protein is a long chain of amino acid residues, folded into a 3D shape defining functional cavities. Think of a protein structure data as the position of its atoms, (named) points, in space. Our goal is the automatic detection of potentially druggable cavities. More precisely, from the information of a protein structure, we want to identify zones were an small molecule could bind. This target candidates are cavities with both geometrical and chemical properties we want to model and learn. In this work I present different combinatorial models and how they can be used to identity cavities. Independently, a machine learning algorithm is trained to identify possible binding spots. Residues are subsequently clustered together taking into account both geometry and probability of binding. Finally, these clusters are classified into target candidates (or not).