Friday 15:30–17:00 in A130

The path between developing and serving machine learning models.

Adrin Jalali

Audience level:
Experienced

Description

As a data scientist, one of the challenges after you develop and train your model, is to deploy it in production where other systems would use the output of the model in real time. In this tutorial we use PipelineIO, to deploy a cluster on the cloud, which gives us a JupyterHub to develop our method, and uses PMML to persist and deploy and serve the model.

Abstract

Whenever you have a machine learning module in your pipeline, persisting and serving the model is not yet a trivial task. This tutorial shows how an open source framework using several open source technologies could potentially solve the problem.

My journey started with this[1] question on StackOverflow. I wanted to be able to do my usual data science stuff, mostly in python, and then deploy them somewhere serving like a REST API, responding to requests in real-time, using the output of the trained models. My original line of thought was this workflow:

This was the point at which I had been reading and watching tutorials and attending meetups related to these technologies. I was looking for a solution which is better than:

This just sounded wrong, or at its best, not scalable. After a bit of research, I came across PipelineIO[2,3] which seems to promise exactly what I'm looking for. In this tutorial we use PipelineIO, to deply a cluster on the cloud, which gives us a JupyterHub to develop our method, and uses PMML to persist and deploy and serve the model. My own jurney and take from PipelineIO are documented github[4]. I'll use Amazon AWS, but PipelineIO uses Kubernetes and you can easily deploy in any environment in which you can use Kubernetes.

If you work in an environment in which you have different machine learning modules, which should be used in production in real time and as a part of a stream processing pipeline, this talk is for you.

[1] http://stackoverflow.com/questions/42719953/how-to-develop-a-rest-api-using-an-ml-model-trained-on-apache-spark

[2] http://pipeline.io

[3] https://github.com/fluxcapacitor/pipeline

[4] https://github.com/adrinjalali/pipeline-docs

Subscribe to Receive PyData Updates

Subscribe

Tickets

Get Now