Accelerating data science workflow on a GPU using RAPIDS

Harmeet Singh

Audience level:
Intermediate

Description

As data continues to grow, CPU compute capabilities are increasingly becoming a bottleneck for data scientists and engineers. Executing the entire data science pipeline on a GPU using RAPIDS, an open source data science library by NVIDIA, provides unprecedented performance and speed for machine learning and deep learning problems, without leaving the comfort of python.

Abstract

In this tutorial, we will start with the basics of GPU computing and jump on to an hands-on session with cuDF and cuML. cuDF is a core GPU DataFrame library in RAPIDS for loading, joining and manipulating data in a pandas like API, while cuML is a machine learning library for training and tuning machine learning models on a GPU with the ease of scikit-learn.

We will use NVIDIA Tesla T4 GPU on a cloud in our class. The jupyter notebooks used in the class will be shared via github.

This tutorial is meant for data scientists or engineers having some exposure to Numpy, Pandas and Scikit-learn. Anyone who works with large amounts of data or looking for faster compute speeds will also benefit from this tutorial. Basic python knowledge is the only prerequisite.

By the end of this workshop, you will be able to:-

Understand the power of GPU computing in data science Use cuDF to perform core data preprocessing on a GPU Build and train ML models using cuML on a GPU

Subscribe to Receive PyData Updates

Subscribe