Thursday 3:40 PM–4:20 PM in Radio City (#6604)

Unit Testing for Data Scientists

Hanna Torrence

Audience level:
Novice

Description

Many of the available resources on unit testing focus on standard software engineering tasks, but data science work involves some unique challenges. After a refresher on the basic building blocks, you’ll learn the tips and tricks I’ve gathered writing test suites for several data science libraries at ShopRunner.

Abstract

Have you ever looked at a file full of large data operations, probabilistic models, and database interactions and felt overwhelmed at the thought of crafting a test? Many of us began our careers in academia, where the phrase “unit test” is unlikely to pop up. However, as data science becomes an increasingly integral component of production systems, adding some software engineering skills to the data science stack is vital. I’ll walk through examples featuring common data science use cases and weird gotchas with fixtures, mocks, and my favorite libraries full of testing helper functions. In order to keep it concrete we’ll stick with pytest for a testing framework, but many of the lessons apply more broadly. No prior testing experience necessary, but even those who have a head start will likely learn some new tricks!

Subscribe to Receive PyData Updates

Subscribe