Sunday 11:45–12:30 in Audimax

Simplifying Training Deep & Serving Learning Models with Big Data in Python using Tensorflow

Holden Karau

Audience level:
Intermediate

Description

Deep Learning, in addition to being a world class tool for detecting the presence of cats, requires large amounts of data for training. As much vendors may say "no data prep required", they are all excessively optimistic*. This talk will look tools to build a deep learning pipeline with feature prep on top of existing big data technologies without rewriting your code for on-line serving.

Abstract

More Serious Business Kitty Description:

While some deep learning systems have promised to not require any kind of data preparation or cleaning, in practice many folks find that effectively training their models requires some amount of data preparation and often we spend more time on our data preparation than anything else. This talk will examine tools for data preparation that can be used at scale on "big-data" and then how to use their results on-line at serving time (where we hopefully no longer require a cluster to predict every new user).

Less Serious Business Kitty Description:

Deep Learning, in addition to being a world class tool for detecting the presence of cats, requires large amounts of data for training. As much vendors may say "no data prep required", they are all lying*. This talk will look tools to build a deep learning pipeline with feature prep on top of existing big data technologies without rewriting your code for serving.

Traditionally feature prep done in a big data system, like Spark, Flink, or Beam, would have to be rewritting for the on-line serving component. This is about as much fun as when we have to rewrite our sample Python code into Java, as for some reason that's what a lot companies associate with "production." Come for the deep learning buzz-words, stay for the how to perform on-line serving without writing Java code.

*All vendors are optimists when it comes to their own products, including the vendors who pay Holden and Gris but they pay us so its ok.

Subscribe to Receive PyData Updates

Subscribe