Thursday October 28 7:30 PM – Thursday October 28 8:00 PM in Talks I

Bodo: Supercomputing-Like Performance and Scale for Python/Pandas

Ehsan Totoni

Prior knowledge:
No previous knowledge expected

Summary

Bodo is a new compute engine using a novel JIT inferential compiler technology that brings supercomputing-like performance and scalability to native Python analytics code. Bodo automatically parallelizes Python/Pandas code allowing applications to scale to 10,000+ cores and petabytes of data, and is orders of magnitude faster than alternatives such as Spark and Dask.

Description

Python is often praised for simplicity, but criticized for low performance and scalability. Bodo is a new compute engine that brings supercomputing-like performance and scalability to native Python analytics code. Bodo automatically parallelizes Python/Pandas code allowing applications to scale to 10,000+ cores and petabytes of data without any rewrites into Scala, C++ or non-native APIs, making Python the best solution for challenging data engineering tasks like ETL, data prep, and featurization. This is made possible using a new just-in-time (JIT) inferential compiler technology that can automatically performs the optimizations that usually require efforts from world-class performance experts. We will discuss how this technology works, present examples and benchmarks and explain why it is orders of magnitude faster than alternatives such as Spark and Dask.