PyData Seattle 2017 - Presentation: D’oh! Unevenly spaced time series analysis of The Simpsons in Pandas

Indeed data scientists occasionally analyze time series data in which the events of interest are unevenly spaced. For example, when we want to understand how a change to a user interface for Indeed Hire recruiters affects the time it takes them to review candidates, we might look at changes in time intervals between individual candidate dispositions in our logs. When we want to understand the ratio of new business to repeat business - or explore different definitions of repeat business - we analyze the intervals in the creation dates of new requisitions from the same client.

The Pandas data analysis library offers powerful tools for conducting time series analysis. When working on unevenly spaced time series, we have found the shift() and transform() DataFrame methods particularly helpful. Many of the examples of using these methods that we found on the web were used only on small, artificial datasets. Determining how best to apply them to real datasets was not always as straightforward as we would have hoped.

Rather than use internal proprietary data to illustrate examples of how these methods can be used effectively to analyze unevenly spaced time series data, we will instead use data from a publicly available dataset of episodes of The Simpsons at data.world. In doing so, we will also provide an introduction on how to use the data.world API.

The purpose of this tutorial is to

Provide a brief, focused primer on some basic aspects of Pandas
Provide an overview of data.world datasets and accessing them via the API
Show how advanced Pandas tools can be used for analyzing unevenly spaced time series data

Participants will be best prepared for this tutorial if they

Understand Python basics
Have Python 2 or Python 3 installed on their computers
Install the latest versions of Pandas and Jupyter Notebook (recommended: use Anaconda)
Install the data.world Python API (pip install git+git://github.com/datadotworld/data.world-py.git)
Create a data.world account and an API key via the data.world Advanced Settings page

Update: jupyter notebooks associated with the tutorial have been uploaded to a GitHub repository.

Wednesday 10:00 AM–12:00 PM in Track 2 - Baker

D’oh! Unevenly spaced time series analysis of The Simpsons in Pandas

Joe McCarthy

Description

Abstract

Subscribe to Receive PyData Updates

Tickets