Sunday 15:15–16:00 in Hörsaal 3

CatBoost: Fast Open-Source Gradient Boosting Library For GPU

Vasily Ershov

Audience level:
Novice

Description

CatBoost (http://catboost.yandex) is a new open-source gradient boosting library, that outperforms existing publicly available implementations of gradient boosting in terms of quality. It also has a great advantage - it has very fast GPU training implementation. In this talk we will walk you through major features of the library and will explain why and how you should use GPU training.

Abstract

Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. For a number of years, it has remained the primary method for learning problems with heterogeneous features, noisy data, and complex dependencies: web search, recommendation systems, weather forecasting, and many others. CatBoost (http://catboost.yandex) is a new open-source gradient boosting library, that outperforms existing publicly available implementations of gradient boosting in terms of quality. It has a set of addional advantages, and among them - the fastest GPU training implementation of all existing open-source GBDT libraries.

In this talk we'll provide a brief overview of problems which could be solved with CatBoost. The we will walk you through main features of the CatBoost library. We will also explain how to use GPUs to accelerate gradient boosting on decision trees. We'll provide benchmarks that shows that our GPU implementation is five to 40 times faster compared to Intel server CPUs. We'll also provide performance comparison against GPU implementations of gradient boosting in other open-source libraries.

Subscribe to Receive PyData Updates

Subscribe