In the past few years, Gradient boosting Machines are present in most of winner solutions for Machine Learning contests, like a new "default" technique. This talk will introduce the main ideas about this technique, in an incremental way, starting from a simple decision tree. Also we will discuss about some available libraries who implement it, applied to a "real life" use case.
Gradient Boosting Machines (GBM), a general purpose supervised learning method that achieves the highest accuracy on a wide range of datasets in practical applications. Deep learning is all the hype now, but apart from specific domains such as images or speech, it is usually outperformed by Gradient Boosting in a majority of general business domains and supervised learning applications.
XGBoost is an open source implementation of it, has an easy-to-use interface from both R and Python. XGBoost has become a favorite tool in Kaggle competitions. Besides feature engineering, cross-validation and ensembling, GBM is a key ingredient for achieving the highest accuracy in many data science competitions and more importantly in practical applications.
On this talk we will cover: - Introduction to supervised learning, decision trees, random forests. - Stacking methods: mixing categorical and numerical features. - "Real life" use case: predict the condition of a published item (new or used).