Saturday 10:15–11:00

Understanding Random Forests

Marc Garcia

Audience level:
Novice

Description

No machine learning algorithm dominates in every domain, but random forests are usually tough to beat by much. And they have some advantages compared to other models. No much input preparation needed, implicit feature selection, fast to train, and ability to visualize the model. While it is easy to get started with random forests, a good understanding of the model is key to get the most of them.

Abstract

This talk will cover decision trees from theory, to their implementation in scikit-learn. An overview of ensemble methods and bagging will follow, to end up explaining and implementing random forests and see how they compare to other state-of-the-art models.

The talk will have a very practical approach, using examples and real cases to illustrate how to use both decision trees and random forests.

We will see how the simplicity of decision trees, is a key advantage compared to other methods. Unlike black-box methods, or methods tough to represent in multivariate cases, decision trees can easily be visualized, analyzed, and debugged, until we see that our model is behaving as expected. This exercise can increase our understanding of the data and the problem, while making our model perform in the best possible way.

Random Forests can randomize and ensemble decision trees to increase its predictive power, while keeping most of their properties.

The main topics covered will include:

  • What are decision trees?
  • How decision trees are trained?
  • Understanding and debugging decision trees
  • Ensemble methods
  • Bagging
  • Random Forests
  • When decision trees and random forests should be used?
  • Python implementation with scikit-learn
  • Analysis of performance

Sponsors