Friday 3:45 PM–5:15 PM in Room 2

Using Dask for Parallel Computing in Python

Skipper Seabold

Audience level:
Intermediate

Description

Dask is a relatively new library for parallel computing in Python. It builds around familiar data structures to users of the PyData stack and enables them to scale up their work on one or many machines. This tutorial will introduce users to the core concepts of dask by working through some example problems. The tutorial will be distributed via Jupyter Notebooks.

Abstract

The tutorial will introduce users to the core concepts of dask including

  • the dask scheduler
  • dask graphs
  • the dask caching layer

The tutorial will also introduce the core data structures including

  • dask arrays
  • dask bags
  • dask data frames

We will make use of both real and generated data sets to learn and reinforce these concepts.