Saturday 11:45 AM–12:30 PM in Speakeasy

A Practical Introduction to Airflow

Matt Davis

Audience level:
Novice

Description

Airflow is a pipeline orchestration tool for Python that allows users to configure multi-system workflows that are executed in parallel across workers. I’ll cover the basics of Airflow so you can start your Airflow journey on the right foot. This talk aims to answer questions such as: What is Airflow useful for? How do I get started? What do I need to know that’s not in the docs?

Abstract

Airflow is a popular pipeline orchestration tool for Python that allows users to configure complex (or simple!) multi-system workflows that are executed in parallel across any number of workers. A single pipeline might contain bash, Python, and SQL operations. With dependencies specified between tasks, Airflow knows which ones it can run in parallel and which ones must run after others. Airflow is written in Python and users can add their own operators with custom functionality, doing anything Python can do.

Moving data through transformations and from one place to another is a big part of data science/engineering, but there are only two widely-used orchestration systems for doing so that are written in Python: Luigi and Airflow. We’ve been using Airflow (http://pythonhosted.org/airflow/) for several months at Clover Health and have learned a lot about its strengths and weaknesses. We use it to run several pipelines multiple times per day. One includes over 450 heavily linked tasks!

We will use this talk to give a practical introduction to Airflow that gives people the information they need to decide whether Airflow is right for them and how to get started. We will cover things such as:

  • How Airflow schedules run
  • How to build pipelines and tasks
  • Built-in capabilities and UI
  • Extension options
  • Operations and deployment

About Clover Health: Clover Health is reinventing the health insurance model by using its data and analytics platform to identify at-risk members and partner with providers to accelerate care coordination, improve health outcomes and reduce avoidable costs. Built with technology at its core, Clover aggregates and structures data from a wide range of sources – from primary care providers and lab results, to customer service interactions and home visits – for continuous, real-time monitoring. For more information, visit www.cloverhealth.com.