PyData Amsterdam 2019 - Presentation: What’s the uncertainty on your ML prediction?

Providing uncertainties on predictions is crucial for making well-informed decisions. Surprisingly, estimating uncertainties on individual predictions is uncommon and not done by any of the common ML libraries. To fill this gap, we have implemented the maximum likelihood based uncertainty estimation technique in Python for machine learning algorithms like logistic regression and neural networks.

Uncertainties are invaluable for decision making. Say you expect 10 people at your dinner party but it could easily be 8 people more, then the amount of food you prepare is completely different than if it could only be one person more. In sciences, especially physical sciences, it is common practice to estimate uncertainties on your predictions, in the form of e.g. errors, intervals, or limits. Yet in data science these are not commonly seen. In machine learning (ML) problems one often calculates global uncertainties, namely performance estimates such as precision, recall and AUC. However, estimating uncertainties on individual predictions is uncommon whereas both in a business setting, where your machine learning model is used in a production environment, as well as during exploratory analysis, when determining the viability of a project/model one could use such uncertainties for decision making. The shortage of tooling for this purpose has inspired the ING Wholesale Banking Advanced Analytics team to create a Python library for estimating uncertainties and make them part of common (machine learning) algorithms, like regressors, classifiers and neural networks. The package can be found at: https://github.com/faab5/errortools/.

In this presentation we explain the mathematics that allows one to make statistical uncertainty estimates for models that minimize a loss function. Calculating uncertainties on predictions is split into two steps: estimating statistical uncertainties on model weights and propagating these to predictions. We show common approximations that can be made in both steps that allow one to make fast or even analytical calculations. We demonstrate by means of examples how one can use our package to incorporate uncertainty estimations in their daily machine learning practice. Finally, it is shown how these uncertainties turn out for several common machine learning algorithms and how we visualize the evaluated uncertainty estimates.

Saturday 14:45–15:20 in Auditorium

What’s the uncertainty on your ML prediction?

Eva van Weel, Fabian Jansen

Description

Abstract

Subscribe to Receive PyData Updates