Data scientists in a large team often spend a substantial amount of their time solving a set of common, organization-specific issues: data wrangling, connections to external systems, and logging configuration, among others. At 84.51˚, our team has spent years abstracting away these tasks into common tooling across the organization. We'll share our experience and the best practices we've developed.
Data scientists in a large team often spend a substantial amount of their time solving a set of common, organization-specific issues: data wrangling, connections to external systems, and logging configuration, among others. While every use case is not identical, foundational tooling that simplifies or abstracts away these common tasks can greatly improve the productivity of data scientists in an organization.
At 84.51˚, our team is responsible for designing and implementing this infrastructure for a department of over 250 data scientists. We maintain about 15 heavily-used packages, web apps, code templates, and other tools, which are collectively used approximately 5,000 times per day. Through our experience, we've learned the principles of building out (and maintaining!) a rich ecosystem of tooling that users appreciate, and during this talk we'll share our own journey and recommendations for other organizations that could benefit from something similar.
The majority of this talk will be high level, without much code shared. While we will discuss our internal infrastructure, we'll dedicate the majority of the session to concrete advice and refer to our own experience mainly as supporting evidence.