Friday October 29 3:30 PM – Friday October 29 4:00 PM in Talks II

Let's Implement Bayesian Ordered Logistic Regression!

Marco Edward Gorelli

Prior knowledge:
Previous knowledge expected
Bayesian inference

Summary

If you work in data science / statistics, you've probably run classification tasks. But what if your categories have some ordering? You might have just used regression and binned the output somehow, but is there a principled, Bayesian way to do this? And what if you have an ordered, categorical feature?

In this talk, you'll learn how to implement Ordered Logistic Regressor, in Python!

Description

Target audience

Data scientists, statisticians, and anyone else working on prediction / inference problems. Basic familiarity with Bayesian inference and statistics with be assumed. The talk will involve some maths but will include programmatic examples of every topic covered. The overall tone will be somewhat light-hearted.

Why would we want ordered logistic regression anyway?

Say you're trying to predict survey responses, say on a scale of 1 to 7. You can't just treat them as 7 different categories, as they have an inherent ordering to them. But you also probably don't just want to use plain-old-regression, as the gap between 1 and 2 is likely different to the gap between 3 and 4. So, you may want to turn to ordered logistic regression, and after this talk you'll be well-equipped to use it in practice.

Bayesian logistic regression

You may have used logistic regression, from the scikit-learn library, in a "black-box" manner. But do you remember how it works? As a precursor to the next section, and as a bit of revision, we will implement logistic regression from scratch, in Python, using NumPyro (a probabilistic programming language).

Bayesian ordered logistic regression

Having revised logistic regression, we will learn how to adapt it do deal with ordered categories, both in the input and output spaces. We will go over the theory, an implementation in NumPyro, and see what this reveals in a real-world dataset.

Extra: non-Bayesian approach

In my line of work, I often care about quantifying uncertainty. However, that's not the case for everyone - what alternative to Bayesian ordered logistic regression is there if you only need pointwise predictions? We'll look at an approach which is popular on the Data Science competitions platform Kaggle.

Takeaways

You will have learned how Bayesian Ordered Logistic Regression works and how to implement it. I will make all resources for the talk available on GitHub.