Friday 10:45–13:00 in Hall 7

Data Wrangling with Python

Katharine Jarmul

Audience level:
Novice

Description

Want to learn how to clean, investigate, explore and analyze your data using Python? This workshop will take you from using Python as a developer into the basics of using Python as a data wrangler. We will cover an introduction to several data science libraries including Pandas, as well as some basic charting and reporting libraries.

Abstract

Overview

In this tutorial, we'll be taking an exploratory look at how to perform data wrangling using Python. It's assumed the audience has a working understanding of Python and some basic data analysis exposure.

Participants will leave the class with a better understanding of useful Python libraries for data analysis, as well as some hands-on scripts built throughout the tutorial. With these initial first steps, they should feel comfortable integrating data analysis into their current Python projects.

Please download the github repository before the tutorial.

Outline

Introduction to Data Wrangling with Python

  • How to ask questions
  • Where to find data
  • Why Python?

Initial setup

  • IPython Notebook
  • Getting the data

Importing data

  • Working with easy formats
  • Working with hard formats
  • APIs

Exploring data

  • Using pandas
  • Asking simple questions
  • Joining datasets

Analysing data

  • What is the data saying
  • Standardization and normalization
  • Making conlusion

Reporting your findings

  • Who is your audience
  • Charting data
  • Interactive charts and graphs

Next steps

  • Where to go from here
  • Q&A
  • "Homework"