Sunday 9:00 AM–10:30 AM in Fertitta Hall, Room LL105

Learning to Scale Data Science, Machine Learning, and Pandas with Ray and Modin

Devin Petersohn

Audience level:
Intermediate

Description

In this tutorial, attendees will learn how to use Ray to scale their new and existing Python code. It will cover the Ray system architecture, example applications, GPU support, and best practices. It will also include material for more comprehensive exercises. Attendees will also receive a tutorial on Modin, and how Pandas workflows can be scaled by changing a single line of code.

Abstract

Exercise 1: Simple Data Parallel Example

The goal of this exercise is to show how to run simple tasks in parallel.

Attendees will:

Exercise 2: Parallel Data Processing with Task Dependencies

The goal of this exercise is to show how to pass object IDs into remote functions to encode dependencies between tasks.

Attendees will:

Exercise 3: Tree Reduce

The goal of this exercise is to show how to implement a tree reduce in Ray by passing object IDs into remote functions to encode dependencies between tasks.

Attendees will learn how to write their own tree reduce in Ray.

Exercise 4: Nested Parallelism

The goal of this exercise is to show how to create nested tasks by calling a remote function inside of another remote function.

Attendees will learn how to write a parallel hyperparameter sweep in Ray.

Exercise 5: Handling Slow Tasks

The goal of this exercise is to show how to use ray.wait to avoid waiting for slow tasks.

Attendees will:

Exercise 6: Process Tasks in Order of Completion

The goal of this exercise is to show how to use ray.wait to process tasks in the order that they finish.

Attendees will:

Exercise 7: Introducing Actors

The goal of this exercise is to show how to create an actor and how to call actor methods.

Attendees will:

Exercise 8: Actor Handles

The goal of this exercise is to show how to pass around actor handles.

Attendees will:

Exercise 9: Speed up Serialization

The goal of this exercise is to illustrate how to speed up serialization by using ray.put.

Attendees will:

Exercise 10: Using the GPU API

The goal of this exercise is to show how to use GPUs with remote functions and actors.

Attendees will:

Exercise 11: Custom Resources

The goal of this exercise is to show how to use custom resources

Attendees will:

Exercise 12: Pass Neural Net Weights Between Processes

The goal of this exercise is to show how to send neural network weights between workers and the driver.

Attendees will:

Exercise 13: Modin, Learning to Increase the Speed of Pandas Workflows with One Line of Code Change

The goal of this exercise is to show how to use Modin to speed up Pandas workflows and interact with data

Attendees will:

Subscribe to Receive PyData Updates

Subscribe