Friday 9:00 AM–10:30 AM in Room #1023/1022/1020 (1st Floor)

Pandas from the Inside

Stephen Simmons

Audience level:
Intermediate

Description

Pandas is great for data analysis in Python: intuitive DataFrames from R; fast numpy arrays under the hood; groupby like in SQL. But this familiarity is deceptive: pandas users often get stuck on things they feel should be simple. This talk look inside pandas to see how DataFrames actually work when building, indexing and grouping tables. You will learn how to write fast, efficient pandas code.

Abstract

Pandas is great way to quickly get started with data analysis in Python: intuitive DataFrames from R; fast numpy arrays under the hood; groupby just like SQL. But this familiarity is deceptive and both new and experienced pandas users often get stuck on things they feel should be simple.

In this tutorial, we look inside pandas to see how DataFrames actually work when building, indexing and grouping tables. We will learn which pandas operations are fast and why, and how to avoid common performance pitfalls. By the end of the tutorial, you will develop a strong and reliable intuition about using pandas effectively.

During this tutorial, you are welcome to follow along on your laptop with the sample data sets and example code in a Jupyter notebook. These will be made available on GitHub a few days before the tutorial. The code targets Python 3 and the latest pandas release (currently 0.18.1).