Sunday 2:00 PM–2:45 PM in Room 1

When Worlds Collide: Productionalizing a Data Science Model

Tudor Radoaca, Nicole Carlson

Audience level:
Novice

Description

On our first data science project at Shiftgig, the data science and engineering teams had to build software that was production-ready while maintaining the flexibility of a data science sandbox. Although these seem like irreconcilable goals, they forced us to improve inter-team communication and ultimately helped create a great product. We’ll walk through our process and the lessons we learned.

Abstract

Data engineers and data scientists operate under different constraints. Engineers want stable, testable, high code-quality codebases. Data Scientists want flexibility and sandbox environments to easily test their hypotheses. These differences can exacerbate the already difficult inter-team dynamics involved in building a joint product. While working for the first time as a cross-functional team to build a data application, we had to deal with the following issues:

  • How much documentation should the data scientists provide the engineers about the model

  • Reconciling differing interpretations of business logic between teams

  • Reimplementing code from IPython notebooks into an abstracted, object-oriented application

  • Predicting future usage of the application given only the current use case

  • How to allow the data team to easily experiment and directly reimplement functionality without necessarily going through the full engineering code review process

This talk will discuss how those problems arose and how the data science and engineering teams solved them while creating a new application.