Thursday 9:00 AM–10:30 AM in Music Box 5411/Winter Garden 5412 (5th fl)

Idiomatic Pandas

Ted Petrou

Audience level:
Intermediate

Description

A surprising amount of experienced pandas users write poor and inefficient code. Additionally, StackOverflow is littered with outdated yet highly upvoted answers. This tutorial will walk through the most common situations where inefficient code rears its ugly head and what the idiomatic pandas solution would be. The tutorial will rotate between instructor guided lessons and student exercises.

Abstract

Pandas Basics

Objective - Make sure everyone is properly set-up, gauge the skill level of the audience and gently begin data exploration

Selection Infection

Objective - Pandas has many and confusing indexing operators. The ancient and now deprecated ix was once popular and continues to infect the code of many users because of its persistence in old answers from StackOverflow. The current idiomatic selection of data will be shown.

Avoiding apply with axis=1

Objective - It is extremely common to think iteratively (for loops) when doing data analysis. The apply method used with axis=1 hides iteration and can be extremely slow. Tips for avoiding this situation and vectorizing as much code as possible will be presented.

When loops are better than vectorization

Objective - Loops are not necessarily evil and situations arise when they are useful and surprisingly can be faster than vectorized code.

Tidying data before analysis

Objective - A common mistake is to immediately try and produce an end result in a data analysis. Reshaping data into tidy form can dramatically simplify future operations and improve performance.

Miscellaneous

Objective - Pandas is an incredibly powerful tool but also is incredibly easy to write poor answers. A selection of poor highly upvoted answers from StackOverflow will be compared to more idiomatic solutions.

Subscribe to Receive PyData Updates

Subscribe