Wednesday 1:00 PM–3:00 PM in Track 2 - Baker

Parallelizing Scientific Python with Dask

Jim Crist, David Mertz

Audience level:
Novice

Description

Dask is a flexible tool for parallelizing Python code on a single machine or across a cluster. It builds upon familiar tools in the PyData ecosystem (e.g. NumPy and Pandas) while allowing them to scale across multiple cores or machines. This tutorial will cover both the high-level use of dask collections, as well as the low-level use of dask graphs and schedulers.

Abstract

Dask is a flexible tool for parallelizing Python code on a single machine or across a cluster.

We can think of dask at a high and a low level

Different users operate at different levels but it is useful to understand both. This tutorial will cover both the high-level use of dask.array and dask.dataframe and the low-level use of dask graphs and schedulers. Attendees will come away

Subscribe to Receive PyData Updates

Subscribe

Tickets

Get Now