Friday 9:00–10:30 in Tower Suite 2

Maintainable Code in Data Science

Kevin Lemagnen

Audience level:
Intermediate

Description

Notebooks are great, they allow to explore your data and prototype models quickly. But they make it hard to follow good software practices. In this tutorial, we will go through a case study.We will see how to refactor our code as a testable and maintainable Python package with entry-points to tune, train and test our model so it can easily be integrated to a CI/CD flow.

Abstract

Notebooks are great, they allow to explore your data and prototype models quickly. But they make it hard to follow good software practices such as versioning, testing or writing clean modular and reusable code. In this tutorial, we will go through a case study with a full model developed in a notebook. We will see how to refactor our code as a testable and maintainable Python package with entry-points to tune, train and test our model so it can easily be integrated to a CI/CD flow.

To do so we will leverage tools available in sklearn such as ColumnTransformer, custom transformers and pipelines.

Subscribe to Receive PyData Updates

Subscribe