Friday 12:35–13:05 in Track 1

Data-driven and Test-driven product development with Airflow, Jupyter and (Py)Spark at Allegro

Tomasz Bartczak

Audience level:
Novice

Description

In this talk I will discuss the way data-driven products are built Allegro with examples from image quality classification up to a search relevancy pipeline. Topics covered are: Test or data driven development (a.k.a Jupyter or IDE) Airflow as a Data workflow and scheduler platform Example project pipelines Lessons learned

Abstract

In this talk I will discuss the way data-driven products are built Allegro with examples from image quality classification up to a search relevancy pipeline. Topics that will be covered are: Test and data driven development: How to get/go from a jupyter notebook to a production scheduled batch job Automated testing of pySpark jobs Spark jobs - Python or Scala? Airflow as a Data workflow and scheduler platform Metrics and monitoring Example project pipelines * Lessons learned

Subscribe to Receive PyData Updates

Subscribe

Tickets

Get Now