Friday October 29 8:30 PM – Friday October 29 9:00 PM in Talks II

Getting started with Dask using Saturn Cloud

Mitali Sanwal

Prior knowledge:
No previous knowledge expected

Summary

Dask is a powerful Python framework for running code in parallel across multiple machines, speeding up tasks like model training. While parallel computing might seem intimidating, Dask is easy to get started with. Saturn Cloud, the data science cloud platform, is great for quickly spinning up a Dask cluster to use. In this talk we'll show you how to harness the power of Dask with Saturn Cloud.

Description

  • Introduction to parallel computing - how is writing code to run on multiple machines different than on a single machine.
  • Basics of Dask - what is Dask and how does it work.
  • Dask collections (DataFrames, arrays, bags) for computing on data in parallel - these data types allow you to abstract away the parallel backend and keep reasoning about data structures you know.
  • Delayed and futures for running complex parallel computations - for instances where you want more control of how the parallel computations work you can use this frameworks.
  • Spinning up a Dask cluster on Saturn Cloud - how to get started using Dask by having Saturn Cloud do the complex work of setting up the compute cluster for you.
  • Some cool things you can do like parallel GPU training - there are amazing things you can do with Dask, we'll quickly walk through a few of them.