Friday 11:00–12:30 in Tower Suite 3

Mastering Gradient Boosting with CatBoost

Anna Veronika Dorogush

Audience level:
Novice

Description

Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. This tutorial will explain details of using gradient boosting on practice, we will solve a classification problem using popular GBDT library CatBoost.

Abstract

Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. For a number of years, it has remained the primary method for learning problems with heterogeneous features, noisy data, and complex dependencies: web search, recommendation systems, weather forecasting, and many others.

CatBoost (https://catboost.yandex) is a popular open-source gradient boosting library with a whole set of advantages: 1. CatBoost is able to incorporate categorical features in your data (like music genre or city) with no additional preprocessing. 2. CatBoost has the fastest GPU and multi GPU training implementations of all the openly available gradient boosting libraries. 3. CatBoost predictions are 20-60 times faster then in other open-source gradient boosting libraries, which makes it possible to use CatBoost for latency-critical tasks. 4. CatBoost has a variety of tools to analyze your model.

This tutorial will feature a comprehensive tutorial on using CatBoost library. We will walk you through all the steps of building a good predictive model. We will cover such topics as: - Working with different types of features, numerical and categorical - Working with inbalanced datasets - Using cross-validation - Understanding feature importances and explaining model predictions - Tuning parameters of the model - Speeding up the training

Subscribe to Receive PyData Updates

Subscribe