PyData New York City 2019 - Presentation: From Raw Recruit Scripts to Perfect Python (in 90 minutes)

After mastering python basics, it gets increasingly difficult to produce well-structured and test-driven code. Code that is fit for purpose in the short term could become stale due to a lack of composability and testing. In this tutorial, we will go beyond a typical modeling notebook and turn it into well-structured python code equipped with tests that are easy to understand, maintain and extend.

Whether developing new models in Jupyter Notebooks or porting existing code from older infrastructure or other technologies (e.g. Excel, SAS), data scientists are often faced with disorganized structure, reproducibility issues or low run-time performance.

Model implementation quality and performance plays a critical role in successful deployment, continued use and future maintenance costs. Key drivers of this success include modularized code and tests that are well defined, both of which often get neglected or left out entirely.

In this tutorial we will start with a Jupyter Notebook that represents a sample model with typical shortcomings, such as a mixing of input data processing with model logic, missing tests, lack of usage examples or confusing code.

In a series of steps, we will incrementally refactor the code into intuitive modular python, using the best tools from the python ecosystem.

You will learn to

structure your code in composable and re-usable blocks with in-line documentation and examples
catalog and organize boilerplate data sourcing (using Intake)
use automated testing (pytest and hypothesis) and static code analysis (PyLint) to guarantee code quality and reproducibility
analyze performance (cProfile, line_profiler) to identify hot-spots and guide run-time optimization
apply just-in-time compilation (JIT) and vectorization using numba for even faster performance

This tutorial is for you if you

want to take the next step after beginner python tutorials
mainly use Jupyter Notebooks for your work and want to add more tools to your toolbox
want to help your team in improving code quality
are in the process of migrating code or models from other technologies (SAS, Excel) and want to use best-practices from the start

Wednesday 1:15 PM–2:45 PM in Broadway (5202)

From Raw Recruit Scripts to Perfect Python (in 90 minutes)

Stanley van der Merwe, Petr Wolf

Description

Abstract

Subscribe to Receive PyData Updates