Sunday 2:15 PM–3:00 PM in Modeling & Data Techniques - Rm 100A

Using Sockeye Neural Machine Translation in a Streaming Pipeline

Jeff Zemerick

Audience level:


The world wide web contains text in many languages and modern systems often cannot be restricted to a single locale. Being able to make use of the text in other languages requires a pipeline that can scale. We'll describe and demonstrate how we can create a streaming pipeline to consume, preprocess, and translate the streaming text.


Sockeye is a neural machine translation application written in Python and built on top of Apache MXNet. Apache Flink is a scalable streaming framework well-suited for handling large amounts of incoming text. In this talk we will present an overview of neural machine translation and an introduction to Sockeye and Flink. We will see how we can use Sockeye and Flink together to build a scalable streaming text translation pipeline. Developers will take away from this talk a better understanding of neural machine translation and have access to the code repositories to experiment with their own translation pipelines.

Subscribe to Receive PyData Updates