Saturday 11:00 AM–11:45 AM in Room #1025 (1st Floor)

Creating Python Data Pipelines in the Cloud

Femi Anthony

Audience level:
Intermediate

Description

My talk will be an analysis of the various approaches to creating data pipelines the public cloud using Python.I will compare and contrast using various Python libraries such as Luigi, Airflow and native cloud frameworks such as Cloud Dataflow (Google), AWS Data Pipeline to create a real world data pipeline in Amazon AWS and Google Compute Engine.

Abstract

Introduction

  • What is a data pipeline
  • Explanation of why we use data pipelines

Tools of the trade

  • Luigi
  • Airflow
  • Native cloud data pipelines
    • Cloud Dataflow
    • AWS Data Pipeline

- Description of Real-world data example

- Implementation of data pipeline using the tools above and comparison of the various tools.

- Conclusions