Tuesday 4:30 PM–5:15 PM in Central Park West (6501)

Reproducibility in ML Systems: A Netflix Original

Ferras Hamad

Audience level:
Intermediate

Description

In this talk, we focus on the importance and benefits of effortless reproducibility as a first-class construct in machine learning infrastructure. From practices around capturing inputs and outputs to code versioning and transitive dependency management.

Abstract

As more machine learning initiatives developed into business critical systems it became increasingly important to provide a means of guaranteeing some degree of reproducibility around them. Although reproducibility is often talked about as a binary construct, either something is reproducible or it isn’t, in practice when it comes to machine learning is best described as a spectrum. It is usually very expensive and difficult to make everything completely reproducible in every situation but often times even offering some level of reproducibility can yield great short-term and long-term benefits.

With the high degree of reproducibility guaranteed by the approach described in this talk we were able to extract multiple benefits such as rapid prototyping, improved debugging, lower barriers to collaboration, and better reliability. This has allowed scheduled workflows to run continuously for months and years with little to no human intervention and reduced the engineering footprint required for each data science related project.

Subscribe to Receive PyData Updates

Subscribe