Sunday 1:15 PM–2:00 PM in Room #1025 (1st Floor)

Open Data Dashboards & Python Web Scraping

Marie Whittaker

Audience level:
Novice

Description

Distilling a world of data down to a few key indicators can be an effective way of keeping an audience informed, and this concept is at the heart of a good dashboard. This talk will cover a few methods of scraping and reshaping open data for dashboard visualization, to automate the boring stuff so you have more time and energy to focus on the analysis and content.

Abstract

This talk will cover a basic scenario of curating open data into visualizations for an audience. The main goal is to automate data scraping/downloading and reshaping. I use python to automate data gathering, and Tableau and D3 as visualization tools -- but the process can be applied to numerous analytical/visualization suites.

I'll discuss situations where a dashboard makes sense (and when one doesn't). I will make a case also that automation makes for a more seamless data gathering and updating process, but not always for smarter data analysis.

Some python packages I'll cover for web scraping and downloading/reshaping open data include: openpyxl, pandas, xlsxwriter, and BeautifulSoup. I'll also touch on APIs.

The case study for this talk, and the source of the example data visualizations, will be the Economic Intelligence Dashboard (http://open.dc.gov/economic-intelligence/). This public dashboard is a curated collection of open data from disparate sources which, taken together, form a narrative of the economic health of DC. As my main field of focus is on economic data, throughout the talk I will touch on the best (most authoritative; easiest to scrape) sources of economic data and data on the District and the region.