PyData New York City 2019 - Presentation: Quantifying uncertainty in machine learning models

Quantifying uncertainty in machine learning models

Audience level:

Intermediate

Description

Many models give a lot more information during the inference process that we usually know. We will begin with an intrinsic estimation of all the distribution with random forest algorithm. Then we will extend those "prediction intervals" to almost every regression models thanks to the quantile loss. Eventually we will discuss about probability calibration to measure uncertainty in classification.

Abstract

We'll see why and how it is very important to compute uncertainty in inferential statistics and predictive machine learning models.

1) Deep dive in random forest

Random Forest gives us naturally an estimation of the distribution for each sample thanks to the bagging technique.

2) Generalisation for regression

The quantile loss is useful to compute prediction intervals for every regression model. It is however a computationally costly. Certain loss like cosh can help against this con.

3) What about classification

In classification, probability is a measure of the uncertainty... but does every model give us good probabilities ? Let plot some reliability curve to check if we need to calibrate the output with a sigmoid or an isotonic regression !

Monday 3:40 PM–4:25 PM in Central Park West (6501)

Quantifying uncertainty in machine learning models

Samuel Rochette

Description

Abstract

Subscribe to Receive PyData Updates