PyData London 2016 | Presentation: 10 things I learned about writing data pipelines in Python and Spark.

Saturday 15:30–16:15 in LG6

10 things I learned about writing data pipelines in Python and Spark.

Ali Zaidi

Audience level:: Intermediate

Description

Starting in the Q4, 2015, I wrote the financials data pipeline that collates ~200 data points and calculates ~300 metrics for ~80M account filings from ~11M private companies. In this talk, I would share what I learned.

Abstract

I am a Data Engineer at Duedil - a fintech enabling access to public data about private companies.
Starting in the Q4, 2015, I wrote the financials data pipeline that collates ~200 data points and calculates ~300 metrics for ~80M account filings from ~11M private companies.
As I write, this is in production: http://bit.ly/1T3CzDG, http://bit.ly/1Q8iBBq.
I used Python, Spark and loads of good fortune to make this. I would like to share my journey with the PyData community - purely to give something back, as I have learned so much out of the meetups.
My talk would include takeaways, patterns, anti-patterns, mistakes and big mistakes that I made and learned from. I think this will be very useful for beginner-intermediate data wranglers.

Saturday 15:30–16:15 in LG6

10 things I learned about writing data pipelines in Python and Spark.

Ali Zaidi

Description

Abstract

Sponsors

Become a sponsor.