Friday Oct. 9, 2020, 10 a.m.–Oct. 9, 2020, 10:30 a.m. in Online

RAPIDS: How to accelerate your data science pipeline by orders of magnitude

Adam Grzywaczewski

Audience level:
Intermediate

Description

This talk will discuss the RAPIDS suite of open source software libraries which give you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. Because its API was deliberately designed to be consistent with existing data science utilities (e.g. Pandas DataFrame, SciKit Learn) its integration in majority of cases is limited to only several lines of code change.

Abstract

Licensed under Apache 2.0, RAPIDS is incubated by NVIDIA® based on extensive hardware and data science experience. RAPIDS utilizes NVIDIA CUDA® primitives for low-level compute optimization, and exposes GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

RAPIDS focuses on common data preparation tasks for analytics and data science. This includes a familiar dataframe API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes.

Some RAPIDS projects include cuDF, a pandas-like dataframe manipulation library; cuML, a collection of machine learning libraries that will provide GPU versions of algorithms available in scikit-learn; cuGraph, a NetworkX-like accelerated graph analytics library. Development follows a 6 week release schedule, so new features and libraries are always on the way.

RAPIDS provides tight integration with the key deep learning frameworks. This means data processed by RAPIDS can be seamlessly pushed to deep learning frameworks that accept array_interface or work with DLPack, such as Chainer, MXNet, and PyTorch.

Subscribe to Receive PyData Updates

Subscribe