Sunday 11:00–11:45 in Auditorium

Bayesian Deep Learning with Edward (and a trick using Dropout)

Andrew Rowan

Audience level:
Experienced

Description

Bayesian neural networks have seen a resurgence of interest as a way of generating model uncertainty estimates. I use Edward, a new probabilistic programming framework extending Python and TensorFlow, for inference on deep neural nets for several benchmark data sets. This is compared with dropout training, which has recently been shown to be formally equivalent to approximate Bayesian inference.

Abstract

Deep learning methods represent the state-of-the-art for many applications such as speech recognition, computer vision and natural language processing. Conventional approaches generate point estimates of deep neural network weights and hence make predictions that can be overconfident since they do not account well for uncertainty in model parameters. However, having some means of quantifying the uncertainty of our predictions is often a critical requirement in fields such as medicine, engineering and finance. One natural response is to consider Bayesian methods, which offer a principled way of estimating predictive uncertainty while also showing robustness to overfitting.

Bayesian neural networks have a long history. Exact Bayesian inference on network weights is generally intractable and much work in the 1990s focused on variational and Monte Carlo based approximations [1-3]. However, these suffered from a lack of scalability for modern applications. Recently the field has seen a resurgence of interest, with the aim of constructing practical, scalable techniques for approximate Bayesian inference on more complex models, deep architectures and larger data sets [4-10].

Edward is a new, Turing-complete probabilistic programming language built on Python [11]. Probabilistic programming frameworks typically face a trade-off between the range of models that can be expressed and the efficiency of inference engines. Edward can leverage graph frameworks such as TensorFlow to enable fast distributed training, parallelism, vectorisation, and GPU support, while also allowing composition of both models and inference methods for a greater degree of flexibility.

In this talk I will give a brief overview of developments in Bayesian deep learning and demonstrate results of Bayesian inference on deep architectures implemented in Edward for a range of publicly available data sets. Dropout is an empirical technique which has been very successfully applied to reduce overfitting in deep learning models [12]. Recent work by Gal and Ghahramani [13] has demonstrated a surprising formal equivalence between dropout and approximate Bayesian inference in neural networks. I will compare some results of inference via the machinery of Edward with model averaging over neural nets with dropout training.

[1] D JC MacKay. A practical Bayesian framework for backpropagation networks. Neural computation, 4(3): 448–472, 1992. [2] Neal, R M. Bayesian learning for neural networks. PhD thesis, University of Toronto, 1995. [3] Hinton, G E and Van Camp, D. Keeping the neural networks simple by minimizing the description length of the weights. In Proceedings of the sixth annual conference on Computational learning theory, 1993. [4] A Graves, Practical variational inference for neural networks. NIPS, 2011. [5] D P. Kingma, T Salimans, M Welling, Variational Dropout and the Local Reparameterization Trick https://arxiv.org/pdf/1506.02557 (2015) [6] A Mnih, K Gregor, Neural Variational Inference and Learning in Belief Networks, ICML, 2014 [7] D P. Kingma, M Welling, Auto-Encoding Variational Bayes. CoRR abs/1312.6114 (2013) [8] D Rezende, S Mohamed, and D Wierstra. Stochastic backpropagation and approximate inference in deep generative models. ICML, 2014. [9] Blundell, C, Cornebise, J, Kavukcuoglu, K, and Wierstra, D, Weight uncertainty in neural networks. ICML, 2015. [10] Hernandez-Lobato, J M and Adams, R P. Probabilistic backpropagation for scalable learning of Bayesian neural networks. ICML, 2015 [11] D Tran, A Kucukelbir, A B Dieng, M Rudolph, D Liang, and D M Blei, Edward: A library for probabilistic modeling, inference, and criticism. arXiv:1610.09787, 2016 [12] Srivastava, N, Hinton, G, Krizhevsky, A, Sutskever, I, and Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 2014. [13] Gal, Y and Ghahramani, Z, Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. ICML, 2016