Monday 4:20 PM–5:00 PM in Central Park East 6501a (6th fl)

Random Forests: Best Practices for the Business World

Gabby Shklovsky

Audience level:
Intermediate

Description

This talk will explain best practices for successfully using random forests in the business world. It will focus on (1) best practices for preparing training data for random forests so that random forests can do what they do best and (2) best practices for interpreting random forest results to address concerns of business leaders that may not trust black box algorithms.

Abstract

Random Forests: Best Practices for the Business World

Quick Overview of Random Forest models

  1. What are random forests?
    • Supervised Learning method
    • Ensemble of decision trees (CART)
  2. What's random about them?
    • Random sampling of data - typically bootstrapping
    • Random sampling of possible split variables
    • Randomness disrupts "greediness" of algorithm

Best Practices for Preparing Training Data for Fitting Random Forests

  1. Factor out linear relationships between predictors and response
    • A strong linear relationship often "overpowers" other subtler effects
    • Let each model do what it does best
  2. Feature engineering is key
    • Use domain expertise / business knowledge as a guide
    • Explicitly define interaction effects as new predictors
    • Use multiple metrics that are proxies for the same concept as predictors
  3. Example: Using random forests to predict customer spend

Best Practices for Interpreting Random Forest Results

  1. Check that variable importances align with expectations/intuition
  2. Check directional relationship between top predictors and response
    • e.g. Do predictions of Y tend to increase as X increases?
    • Manually step through top 4-5 levels of a few trees
    • "Stress test" the model with synthetic data that varies the value of one predictor holding all else equal
  3. Example: Using random forests to predict which customers will cancel

Subscribe to Receive PyData Updates

Subscribe