Friday 11:00–12:30 in Tower Suite 1

Apache Airflow in the Cloud: Programmatically orchestrating workloads with Python

Satyasheel, Kaxil Naik

Audience level:
Intermediate

Description

Introducing the basics of Airflow and how to orchestrate workloads on Google Cloud Platform (GCP). A GCP environment will be provided, users will just need to login with a Google account.

Abstract

Apache Airflow is a pipeline orchestration tool for Python initially built by Airbnb and then open-sourced. It allows data engineers to configure multi-system workflows that are executed in parallel across any number of workers. A single pipeline may contain single or multiple operations like python, bash or submitting a spark-job into the cloud. Airflow is written in python and users can write their custom operators in python.

A data pipeline is a critical component of an effective data science product, and orchestrating pipeline tasks enables simpler development and more robust and scalable engineering.

In this tutorial, we will give a practical introduction to Apache Airflow. We will cover:

Prerequisite: basic familiarity of python.

Subscribe to Receive PyData Updates

Subscribe