Tuesday 2:25 PM–3:05 PM in Track 1

Turning PyMC3 into scikit-learn

Nicole Carlson

Audience level:


PyMC3 is a probabilistic modeling library. Most examples of how to use the library exist inside of Jupyter notebooks. However, making your model reusable and production-ready is a bit opaque. My goal is to show a custom Bayesian Model class that implements the sklearn API. After this talk, you should be able to build your own reusable PyMC3 models.


Scikit-learn is the standard library for data science in part because of its simple API that makes it very easy for new learners to train data science models. However, this library may not have the model you need for your specific problem.

In contrast, PyMC3 is a library that allows you to create almost any model you want using its probabilistic modeling framework. But most of the examples on using the library are in Jupyter notebooks. Additionally, they often only demonstrate how to train on one set of data.

What I’ve found missing are the steps between creating a PyMC3 model and reusing that model with new data in production.

This talk will go over an example of how to train a model in scikitlearn, save it for later use, and reload it to use with new data. Then, I will map those steps to the corresponding methods in PyMC3.

The bulk of the talk will be a demonstration of a custom Hierachichal Logistic Regression class that’s built on top of the scikitlearn API. At the end of the talk, you should be able to take this model as a template for any of your own PyMC3 models.