In this talk we will be showing how Graph Nets (GNs)—a set of statistical models that directly operate on molecular topology by updating and aggregating information between atoms and bonds—can approximate per-atom, per-bond, and per-molecule properties derived by quantum mechanics (QM), with errors within the uncertainty thereof, and an over-500-fold speed up
Computational models at early stages of drug discovery campaigns have been shown to have the potential to reduce the failure rate, time span, and cost associated with designing a new drug. These models are trained on computational expensive QM data to achieve an appropriate level of accuracy. Here, we explore the possibility of approximating such calculations using faster, but just as accurate methods. We introduce a GN model where both nodes (atoms) and edges (bonds) are attributed, thus the problem could thereby be formed as a multitask regression problem. Operating directly on the topological space of molecules, GNs inherently preserve the permutation invariance and equivariance and allow the maximal latent representation sharing among atoms and bonds. To prove this point, we examine the behaviors of a GN model in cases where it predicts per-atom, per-bond, and per-molecule properties independently, jointly, or predicts one with others given. The package that implements the GN model (github.com/choderalab/gimlet) is written in python with TensorFlow 2.0, and does not have any dependencies apart from it. Reading and writing of molecules, realization of aromaticity, typing, and simply molecular mechanics energy calculations, are all written as tf.functions and thus could be compiled into graphs and paralleled when dealing with large datasets.