Saturday 11:50–12:25 in Auditorium

A walk through the isolation forest

Jan van der Vegt

Audience level:
Intermediate

Description

Anomaly detection has a plethora of use cases. In this talk we will look at a specific approach to this issue, Isolation Forests. After learning the basics we will look at certain extensions of the algorithm, one directly from a paper and others that we derive ourselves to better help with our use case. There is accompanying code available on GitHub.

Abstract

Anomaly detection can help with fraud detection, predictive maintenance and cyber security cases amongst others. Next to this it can help on a meta level for other machine learning projects by detecting outliers during training or inference. One of the approaches to anomaly detection is called Isolation Forests. In this talk we will first go over the original idea of the isolation forest paper and a slightly more sophisticated extension called Entropy Isolation Forests. Isolation forests have a number of appealing properties with regards to intuition, parallelism and performance but the basic formulation is missing native support for categorical features and missing values.

By looking at the mathematical formulation and the reasoning behind it we will extend this approach to natively allow categorical and missing values. The goal of the talk is twofold; on the one hand an in-depth introduction to the class of Isolation Forests and on the other hand a look at the process of extending existing methods to suit the needs of your project. The concepts in this talk are accompanied with a GitHub implementation.

Subscribe to Receive PyData Updates

Subscribe