We will present two recent contestants to the XGBoost library: LightGBM (released October 2016) and CatBoost (open-sourced July 2017). The participant will learn the theoretical and practical differences between these libraries. Finally, we will describe how we use gradient boosting libraries at McKinsey & Company.
Gradient boosting proved to be a very effective method for classification and regression in the last years. A lot of successful business applications and data science contest solutions were developed around the XGBoost library. It seemed that XGBoost will dominate the field for many years.
Recently, two major players have released their own implementation of the algorithm. The first - LightGBM - comes from Microsoft. Its major advantages are lower memory usage and faster training speed.
The second - Catboost - was implemented by Yandex. Here, the approach was different. The aim of the library was to improve on top of the state-of-the-art gradient boosting algorithm performance in terms of accuracy.
During the talk, the participants will learn about the differences in the algorithm designs, APIs and performances.