Wednesday 1:15 PM–2:45 PM in Broadway (5202)

From Raw Recruit Scripts to Perfect Python (in 90 minutes)

Stanley van der Merwe, Petr Wolf

Audience level:


After mastering python basics, it gets increasingly difficult to produce well-structured and test-driven code. Code that is fit for purpose in the short term could become stale due to a lack of composability and testing. In this tutorial, we will go beyond a typical modeling notebook and turn it into well-structured python code equipped with tests that are easy to understand, maintain and extend.


Whether developing new models in Jupyter Notebooks or porting existing code from older infrastructure or other technologies (e.g. Excel, SAS), data scientists are often faced with disorganized structure, reproducibility issues or low run-time performance.

Model implementation quality and performance plays a critical role in successful deployment, continued use and future maintenance costs. Key drivers of this success include modularized code and tests that are well defined, both of which often get neglected or left out entirely.

In this tutorial we will start with a Jupyter Notebook that represents a sample model with typical shortcomings, such as a mixing of input data processing with model logic, missing tests, lack of usage examples or confusing code.

In a series of steps, we will incrementally refactor the code into intuitive modular python, using the best tools from the python ecosystem.

You will learn to

This tutorial is for you if you

Subscribe to Receive PyData Updates