Tuesday 5:05 p.m.–5:45 p.m.

Binder: sharing and reproducing computation

Andrew Osheroff, Jeremy Freeman, Kyle Kelley

Audience level:
Novice

Description

Binder is an open-source system for eliminating barriers between sharing and reproducing computation. It lets anyone take Jupyter notebooks from a GitHub repository and bundle them into a reproducible, executable environment that can be launched from Github by clicking a badge. It’s built on Docker and Kubernetes, and provides a flexible, scalable, and extensible platform for sharing computation.

Abstract

The modern science of analysis depends on our ability to share and reproduce our work. Services like GitHub make it simple to share code, and the Jupyter notebook provides a foundation for combining code, results, and ideas into an executable document. But generating an environment in which code is fully executable can be challenging, often requiring complex system configuration. Containerization technologies like Docker set the standard for specifying reproducible environments, but generating and deploying systems of interconnected containers quickly becomes unmanageable.

Binder is an open-source system that aims to eliminate barriers between the notebooks people want to share and the ability for others to execute them.

The Binder web interface accepts a GitHub repository, an environment configuration, and a user-specified list of external services, such as a Spark cluster or a Postgres database. We expose configuration options that are easy for end users, like requirements.txt files for Python projects, and translate them into specifications for one or more Docker images. These containers are then built and distributed across a Kubernetes cluster. By leveraging Kubernetes primitives such as pods, namespaces, and replication controllers, we can flexibly manage resource quotas, ensure inter-deployment isolation, and scale up external services, for example, increasing the number of available workers. Much of this work builds on previous efforts to serve containerized Jupyter notebooks, in particular tmpnb, but with a more flexible platform for container generation and management.

We believe in scientific reproducibility as a public service, and currently maintain a public cluster that hosts Binders. The service runs on Google Compute Engine, but it is designed for portability across different cloud providers. We are eager to work with the community on exploring a wider variety of front-ends -- beyond the Jupyter notebook -- and a more general specification format for computational environments. Try it out at http://mybinder.org, and let us know if you want to get involved!

Sponsors


Become a sponsor.