Monday 1:00 PM–1:45 PM in The Forum, 4th Floor / NLP

Accelerating Data Science with RAPIDS

Mike Wendt

Audience level:
Intermediate

Description

Data science demands the interactive exploration of large volumes of data, combined with computationally intensive algorithms and analytics. Today, the computational limits of CPUs are being realized, and a new approach is needed. We will discuss how the GPU Open Analytics Initiative is breaking the compute barrier with GPU-accelerated libraries such as PyGDF and accelerating data science.

Abstract

  1. Challenges in Data Science today
    1. Technology interoperability
    2. Compute limitations
  2. Apache Arrow
  3. GPUs for compute (CPUs are the bottleneck)
    1. Deep learning
    2. Machine learning
    3. Data analytics
  4. The GPU Open Analytics Initiative (GoAI)
    1. The GPU Data Frame (GDF)
    2. Python library for GDF (PyGDF)
      1. Performance
      2. API
      3. Tips and tricks
    3. Scaling out to multi-GPU and multi-node via Dask GDF
      1. Performance
      2. API
      3. Tips and tricks
    4. CUDA array interface
      1. Numba + CuPy example
      2. PyTorch work in progress
  5. Future work
    1. GPU data frame
      1. Planned features
      2. Planned optimizations
    2. Machine learning
    3. Graph analytics
  6. Questions and answers

Subscribe to Receive PyData Updates

Subscribe