Saturday 11:00–11:45 in Tower Suite 2

Why giving your algorithm ALL THE FEATURES does not always work.

Thomas Huijskens

Audience level:


We'd like to think of ML algorithms as smart, and sophisticated, learning machines. But they can be fooled by the different types of noise present in your data. Training an algorithm on a large set of variables, hoping that your model will separate signal from noise, is not always the right approach. We'll discuss different ways to do feature selection, and discuss open-source implementations.


In general, I want to keep it light on the maths and talk a lot about practical (code) examples of feature selection algorithms. I want to convince the audience that it pays off to do feature selection, and introduce them to some of the Python frameworks out there that do feature selection.

The talk will start with motivating why to do feature selection, and introduce the three main types of feature selection methods: wrapper, filter, and embedded methods. I'll have diagrams illustrating these three types of methods as part of the presentation.

I. Why should I care to do feature selection? (3 min)

II. What makes a good feature selection algorithm? (5 min)

III. Wrapper methods: performance based (5 min)

IV. Filter methods: mutual information based (7 min)

V. Embedded methods: stability selection (7 min)

VI. Practical tips (3 min)

VII. Q&A (5 minutes)

Subscribe to Receive PyData Updates