An introduction to using pandas for data analysis. Materials are available on Github. Please clone the repository and follow the setup before arriving.
The tutorial will focus on solving common problems in data analysis by writing clean, readable, efficient code. Pandas will be the primary tool, though integrations with other libraries like scikit-learn, statsmodels, and matplotlib will be demonstrated. The emphasis will be on gradually learning methods for massaging data into the correct form through real applications, rather than an exhaustive walk-through of pandas' API.
This tutorial is aimed at beginner and intermediate PyData users. Attendees will hopefully have some experience with NumPy. The basics of NumPy and its relationship to pandas will briefly be covered. The core of the tutorial covers
After covering the those operations outlined above we'll next (time-permitting) look at some of the more specialized areas of pandas including Categoricals, time-series analysis, hierarchical indexes, chunked / out of core processing, and data pipelines.