Sunday 15:15–17:15 in Track 2

Structuring machine learning models by using pipelines

Paweł Jankiewicz

Audience level:
Intermediate

Description

Writing code that is modular and maintainable is a standard in software development. Yet many Data Science projects are messy and hard to maintain which reflects that it was constructed by experimenting. The tutorial is meant as an introduction to organizing your code in pipelines. We will walk through the code from a Kaggle competition in which Pawel's team took 1st place.

Abstract

I will introduce the concepts of pipelines which are very useful pattern in Machine Learning models. The focus will be put on tabular data. After introducing the concepts we will write a couple transformations on real datasets. At the end I will focus on comparing the pipelines with functional API of deep learning frameworks.

Agenda: - Scikit-learn pipelines introduction - Combining Scikit-learn transformations with tabular data using Pandas - Writing transformation - Mercari competition code walkthrough
- Comparison with Deep Learning functional APIs

Prerequisites: Scikit-learn

Subscribe to Receive PyData Updates

Subscribe