Saturday 11:00 AM–11:45 AM in Fairness in AI - Room 100D/E

Webscraping Responsibly for Social Science Research

Graham MacDonald

Audience level:
Novice

Description

At the Urban Institute, our Data Science team works with social scientists every day to elevate the debate around social and economic policy. Webscraping in Python is one way we help provide evidence and support research. In this talk, I’ll walk through the projects we work on, the tools we’ve built, and the organizational policy we developed to support responsibly collected web data for research.

Abstract

At the Urban Institute, our Data Science team works with social scientists every day to elevate the debate around social and economic policy. Webscraping in Python is one way we help provide evidence and support research. In this talk, I’ll walk through the projects we work on, the tools we’ve built, and the organizational policy we developed to support responsibly collected web data for research.

Tools

In this section, I'll talk through our Site Monitor tool, built by Urban's Jeff Levy, which allows us to visually monitor the impact of any Python-based webscraping program. I'll also talk through our internal webscraping library, which allows us to draw best practices from past projects and save time.

Projects

In this section, I'll talk through two example projects where we used Python-based webscraping to collect data to help inform social science research. The first is a study where we used webscraped court data to inform an estimate of the total number of Washington, DC residents with criminal backgrounds. The second used Python-based methods to automate the download, extraction, and conversion of over 75,000 files into an efficient big data format to allow researchers easy access to valuable Longitudinal Employer-Household Dynamics data.

Policy

In conclusion, I'll give an overview of the Institute-wide policy that we wrote to guide the responsible use of webscraping for research purposes.

Subscribe to Receive PyData Updates

Subscribe