Saturday 2:15 PM–3:00 PM in Track 1

Speeding up Machine Learning tasks using GPUs in Python

Saloni Jain

Audience level:
Intermediate

Description

GPUs are typically used to accelerate deep learning models, but they haven't been widely deployed for traditional machine learning. This talk will cover cuML's GPU based implementation of Decision Trees and Random Forest algorithms, aimed to provide 10x-50x speedup and a new library called Forest Inference Library (FIL), which allows GPU accelerated inference of different pretrained forest models

Abstract

Traditional machine learning models like Random Forest are great for performing analysis of data. However, with the ever-growing data, the time required to create an end to end pipeline is very high. RAPIDS is an open source suite of GPU-accelerated libraries with the aim of accelerating end-to-end data science pipelines. cuML is a machine learning library in the RAPIDS ecosystem focused on accelerating traditional machine learning algorithms.

This talk will open with an explanation of the current situation, the need for speed-up, and an overview of the cuML library. Then the talk will take a deep dive into the implementation of two important algorithms, Decision Trees and Random Forest. We will also cover a new library, Forest Inference Library (FIL) which performs GPU accelerated inference of pre-trained XGBoost, LightGBM, Scikit-learn Random Forest and cuML’s Random Forest models. In addition, the talk will also walk through an end-to-end workflow showcasing how the cuML API design adheres to the Scikit-learn conventions, the tradeoffs of using the library, and the conditions which will provide better performance. The aim of this talk is to show data scientists and researchers the need for, and the ease with which, they can accelerate their code.

Subscribe to Receive PyData Updates

Subscribe