PyData DC 2016 | Presentation: Promoting a data-driven culture in a world of microservices

Saturday 4:00 PM–4:45 PM in Room #1025 (1st Floor)

Promoting a data-driven culture in a world of microservices

Alex DeBrie, Kelly Burdine

Audience level:: Intermediate

Description

At Hudl, we give every employee full access to our data warehouse, and over 50% of our employees have personally written a query against it. In this talk, I discuss our journey to democratize our data. I touch on technical and non-technical challenges, including the tools we use and the structure of our teams.

Abstract

What you'll learn

Tips for:

Exporting from production databases
Building reliable data pipelines
Choosing which tools to use
Spreading data literacy at your company

Summary

A few years ago, Hudl switched from a monolithic app to a microservices architecture. This switch was great for developer productivity and site reliability, but it made it more difficult to get insights on user behaviors. Data was siloed across many databases, so querying across services was difficult or impossible.

In July 2015, we created the Data Engineering team in order to make our data more accessible. The Data Engineering team built Fulla, an internal data warehouse containing data from 20+ production databases and billions of logs. Today, squads from all areas of the company use Fulla to track key metrics and trends, make decisions on priorities, and engage with users.

In this talk, I discuss all aspects of Fulla. I talk about some of our challenges, including dealing with large scale MongoDB databases, as well as lessons learned on ensuring data quality and promoting a data-driven culture. Finally, I'll provide suggestions on tools to use, both paid and open-source. Tools discussed include Spark, Luigi, Redshift, re:dash, AWS Lambda, Sqoop, Hive, and more.