Tuesday 10:05 AM–10:45 AM in Central Park West (6501)

Deep Dive into scikit-learn's HistGradientBoosting Classifier and Regressor

Thomas J Fan

Audience level:
Intermediate

Description

Gradient boosting decision trees (GBDT) is a powerful machine-learning technique known for its high predictive power with heterogeneous data. In this talk, we will explore scikit-learn's implementation of histogram-based GBDT called HistGradientBoostingClassifier/Regressor and how it compares to other GBDT libraries such as XGBoost, CatBoost, and LightGBM.

Abstract

Gradient boosting decision trees (GBDT) is a powerful machine-learning technique known for its high predictive power with heterogeneous data. In scikit-learn 0.21, we released our own implementation of histogram-based GBDT called HistGradientBoostingClassifier and HistGradientBoostingRegressor. This implementation is based on Microsoft's LightGBM and makes use of OpenMP for parallelization. In this talk, we will:

  1. Learn about the underpinnings of scikit-learn's histogram-based gradient boosting algorithm.
  2. Gain an intuition about HistGradientBoostingClassifier/Regressor's hyper-parameters.
  3. Compare the performance of scikit-learn's implementation with other GBDT libraries such as XGBoost, CatBoost, and LightGBM.

This talk is targeted to those familiar with machine learning and want a deeper understanding of scikit-learn's histogram-based gradient boosting trees.

The materials for this talk can be found at github.com/thomasjpfan/pydata-2019-histgradientboosting.

Subscribe to Receive PyData Updates

Subscribe