Ibis a library that provides unified pandas-like API on top of both single-node local execution (e.g. pandas) and multi-node remote execution (e.g. BigQuery, Impala). In this talk, we will introduce a new backend for executing Ibis programs on Spark and show how you can write analytics that run on both Spark and pandas.
Pandas is the de facto standard (single-node) DataFrame implementation in Python. However, as data grows larger, pandas no longer works very well due to performance reasons. On the other hand, Spark has become a very popular choice for analyzing large dataset in the past few years. However, there is an API gap between pandas and Spark, and as a result, when users switch from pandas to Spark, they often need to rewrite their programs.
Ibis is a library designed to bridge the gap between local execution (pandas) and cluster execution (BigQuery, Impala, etc). In this talk, we will introduce a Spark backend for ibis and demonstrate how users can go between pandas and Spark with the same code.