Scheduling machine learning pipelines in a robust fashion is an important aspect of many data science projects. Apache Airflow is an industry standard tool to author, schedule, and monitor complex data processing workflows. In this workshop, participants will get hands-on experience with Airflow, and learn how to properly schedule their machine learning pipelines.
Over the past years, the data science teams within organizations matured in their experimentation capabilities. Making machine learning pipelines production-ready has become the new main challenge. Apache Airflow is an industry standard tool for authoring, scheduling, and monitoring complex, production-grade workflows, with a strong focus on data processing pipelines. The tool is highly customizable, and integrates well with most modern working environments and data sources. Airflow workflows are written in Python, which intersects with the capabilities of typical data science teams. This workshop is useful for both data scientists and data engineers who want to schedule machine learning pipelines in a robust fashion. In the first plenary part, participants will learn what Airflow is and when it is useful. We will cover the principles and architecture of Airflow, and show some example workflows. After that, participants will create their own workflow to schedule a basic machine learning pipeline, and deploy it. For the more experienced participants, we offer a bonus assignment; they will leverage Airflow’s plugin system to further customize their pipeline. The necessary tools for this workshop will be provided; all you need is a laptop with an internet browser and basic Python knowledge.