Saturday 11:00–11:45 in LG6

High-Performance Distributed Tensorflow: Request Batching and Model Post-Processing Optimizations

Chris Fregly

Audience level:
Intermediate

Description

In this completely demo-based talk, Chris will demonstrate various techniques to post-process and optimize trained Tensorflow AI models to reduce deployment size and increase prediction performance.

Abstract

In this completely demo-based talk, Chris will demonstrate various techniques to post-process and optimize trained Tensorflow AI models to reduce deployment size and increase prediction performance.

First, we'll use various techniques such as 8-bit quantization, weight-rounding, and batch-normalization folding, we will simplify the path of forward propagation and prediction.

Next, we'll loadtest and compare our optimized and unoptimized models - in addition to enabling and disabling request batching.

Last, we'll dive deep into Google's Tensorflow Graph Transform Tool to build custom model optimization functions.

Subscribe to Receive PyData Updates

Tickets

Get Now