Monday 4:20 PM–5:00 PM in Track 2

Bayesian inference in computational chemistry.

Chaya D. Stern

Audience level:


Most drug discovery projects have a 95% failure rate. In this talk I will show how I used Pymc, a Python probabilistic programming language, to improve molecular models used in the early stages of drug discovery to eliminate unlikely drug candidates. In addition, I will discuss how I propagated model error using reweighing techniques.


The cost to develop a new drug currently exceeds 2.5 billion dollars. Most drug discovery projects take thirteen years on average and have a 95% failure rate. Molecular simulations of drug-like molecules and disease targets have enormous potential to reduce the cost and the time of a project by eliminating unlikely drug candidates early on in the development process. These simulations are based on models of inter and intramolecular forces. However, several sources of uncertainty continue to limit the applications of this method. The models utilized in the simulations need to be parameterized to reproduce experimental data. The parameterization process introduces a systematic error that is currently not quantifiable. My dissertation project focuses on quantifying the model uncertainty and its contribution to predicted drug properties. We cast the parameterization problem as a statistical inference problem and adopt a Bayesian probabilistic framework to automate parameterization. Given that the result of Bayesian inference is a probability distribution, we can propagate the systematic error to physical properties that are calculated from computer simulations. This gives us a rigorous error that can be used to optimally select compounds in the drug discovery pipeline by giving us an estimate of how certain the predictions are. In this talk I will show how I used Pymc, a Python probabilistic programming language to generate the model and to sample the posterior. I will discuss some of the model selection and sampling problems I have encountered and the algorithms I used to overcome them. In addition, I will show how I used reweighing techniques to propagate the uncertainty to computed properties.