Wednesday 3:30 PM–5:00 PM in Track 2

Two views on regression with PyMC3 and scikit-learn

Colin Carroll

Audience level:
Intermediate

Description

PyMC3 is a Python library that allows you to specify a statistical model in a natural way, and then reason about it in the presence of data. This talk will compare the approaches from PyMC3 and the popular scikit-learn library in fitting regression models, and in applying regularization.

Abstract

Python has become one of the most popular languages for machine learning due in no small part to exceptional numeric libraries (NumPy, SciPy) that act as building blocks for exceptional machine learning libraries (scikit-learn, pandas). A side effect of the recent rise of deep learning frameworks (Theano, TensorFlow, PyTorch) has been to enable efficient sampling from complex statistical models, which can be considered a building block for probabilistic modeling libraries like PyMC3 and Edward.

In this talk, we will review how to solve regression problems using scikit-learn, and then show how to implement the same models in PyMC3. We extend these models to include regularization in both libraries, and talk about the geometric and statistical assumptions we make in each approach. Finally, we will reflect on why the existence of these two viewpoints is both algorithmically and mathematically beautiful.