Friday October 29 7:30 PM – Friday October 29 8:00 PM in Talks I

An Intro to Workflow Management with Prefect

Kevin Kho

Prior knowledge:
No previous knowledge expected

Summary

As data pipelines become increasingly complex and interconnected, workflow management systems are being used to schedule and monitor tasks. Prefect is an open-source workflow management system designed for large-scale data processes. We'll show how to get started with Prefect and also cover how to run Prefect on top of Dask on the cloud to parallelize workflows.

Description

Workflow management systems are used for scheduling and monitoring data pipelines. This includes managing task dependencies, retrying failed tasks, and sending notifications to users. This talk will show data engineers and scientists how to orchestrate their data workflows with Prefect. After this talk, attendees should understand the basics of workflow orchestration and how to get started implementing it for their use cases.

Prefect is an open-source modern workflow management system designed with Dask natively built-in. Prefect can handle large-scale data pipelines with a multitude of small tasks, as users can use the Dask Executor to take advantage of Dask's millisecond-latency task scheduler. Using Dask also parallelizes Task execution and utilizes distributed compute with minimal overhead. In an interactive demo, we'll go over Prefect basic concepts such as Flows, Tasks, and Parameters. We'll then move on to more advanced topics such as mapping and conditional logic, which let us dynamically create Tasks inside a Flow. During the demo, we will deploy a Flow locally, and then show how seamless it is to port the Flow to a Dask cluster on the cloud.