Presentation: Foundational Infrastructure to Create a Successful Data Science Team

Time Zone

Saturday October 30 10:00 PM – Saturday October 30 10:30 PM in Talks II

Foundational Infrastructure to Create a Successful Data Science Team

Ethan Swan, Brad Boehmke, Gus Powers

Prior knowledge:: No previous knowledge expected

Summary

Data scientists in a large team often spend a substantial amount of their time solving a set of common, organization-specific issues: data wrangling, connections to external systems, and logging configuration, among others. At 84.51˚, our team has spent years abstracting away these tasks into common tooling across the organization. We'll share our experience and the best practices we've developed.

Description

Data scientists in a large team often spend a substantial amount of their time solving a set of common, organization-specific issues: data wrangling, connections to external systems, and logging configuration, among others. While every use case is not identical, foundational tooling that simplifies or abstracts away these common tasks can greatly improve the productivity of data scientists in an organization.

At 84.51˚, our team is responsible for designing and implementing this infrastructure for a department of over 250 data scientists. We maintain about 15 heavily-used packages, web apps, code templates, and other tools, which are collectively used approximately 5,000 times per day. Through our experience, we've learned the principles of building out (and maintaining!) a rich ecosystem of tooling that users appreciate, and during this talk we'll share our own journey and recommendations for other organizations that could benefit from something similar.

The majority of this talk will be high level, without much code shared. While we will discuss our internal infrastructure, we'll dedicate the majority of the session to concrete advice and refer to our own experience mainly as supporting evidence.