PyData Los Angeles 2019 - Presentation: Data and ETL with Notebooks in Papermill

Notebooks have traditionally been a tool for drafting code and avoiding repeated expensive computations while exploring solutions. However, with new tools like nteract's papermill and scrapbook libraries, this technology has been expanded to make a reusable and parameterizable template for execution. We'll look at how to make use of this pattern for Data and ETL processes.

Intro

Myself, Netflix, and Why I'm here
What does a Data Platform Team do?
Projects and Open Source tools discussed in presentation Papermill, Jupyter, nteract, etc

Notebooks

What are Jupyter Notebooks?

We'll some visual examples and breakdowns of notebooks.

How Notebook Work

A guide through how a notebook executes and the model it uses to run your code.

Traditional Use Cases

Around experimentation and code development.

New Use Cases

For production data and operations without full rewrites of Notebook code.

Papermill

What is papermill?

papermill is a library for executing notebooks programmatically.

How do you use it?

You'll see some examples in Python and with it's provided CLI.

How does it fit into the Notebook model?

We'll relate the execution back into original Notebook execution diagrams.

How to extend papermill

Quick pointer to the extensibility of the library and how to add new functionality.

Using papermill in production data pipelines

Operationalizing Notebooks

Failure analysis, Productionalization, Sharing executions...

Dags of Notebooks

Making a pipeline with Notebooks.

Integration Testing

Good practices Where unittesting doesn't fit

@ Netflix usage

Quick blip about adoption and usage at Netflix.

Wednesday 2:00 PM–2:45 PM in Track 2 Room

Data and ETL with Notebooks in Papermill

Matthew Seal

Description

Abstract