Jupyter notebooks are a key part of a data scientist's professional output. Learn how to go from a notebook that shows a report to a reproducible service with the help of Docker. In this short walkthrough, you'll learn how to write a Dockerfile that packages your work effectively and consistently, so that it can be shared with any audience.
Presenting a Jupyter notebook with your finished work is good, but what if you want to allow someone to play with the finished product themselves? Sure, you could include README instructions on how to install requirements, include a link to download and process the data, and include pickled models in a persistent file storage, but that introduces a lot of room for error. In this talk, we will walk through the process for building a Docker container that will include processed data, will pre-run models, and allow a user to explore the resulting outputs in a familiar Jupyter environment.
We will discuss the ready-to-run Docker images from the Jupyter project, and some of the limitations that mean you might want to write and push your own images. Attendees of the talk will get a working and well-commented Dockerfile that builds a machine learning model and packages the data up for exploration, as well as a cheat sheet of useful commands for working with containers in a Jupyter environment.
Note: No Docker experience is required for this talk, but an installed Docker instance and account on DockerHub may be useful.