Day 1 Plenary Hall 11:15 - 12:00

Talk/演講: Principle of building robust data pipelines with Apache Airflow

隨著企業收到的資料越來越多,資料管線的設計和管理也變得越趨重要。 資料管線的設計方式和一般程式不盡相同,需要考量的面相也不一樣。 本次講座將從資料管線的使用情境開始,介紹常見的資料管線情境,以及常面臨的困難,像是任務排程、狀態管理、狀態接續、回溯、任務日誌等。 本次講座將整理出一些設計資料管線的原則供設計者參考,並搭配介紹一常見的管線管理工具-Airflow,來了解如何解決這些困難。

1. 什麼是資料管線
2. 常見的資料管線情境
3. 設計資料管線時的常見問題
4. 資料管線的設計原則
5. Airflow 基本介紹(基本架構、基本觀念)
6. Airflow 實務範例(從 0 開始建立 DAG)
7. Airflow 的進階技巧(任務間的依賴關係、流程與任務的拆分、Airflow on k8s)

Speaker/講者: Bryan Yang

Hi, I'm Bryan, an expert at solving business problems with data and data science. I've worked with data for more than 10 years. I've been a consultant, programmer, data engineer, data scientist and I'm a solution architect in LINE TV now. What I do for these years are few things including defining the problem, finding the metrics, and the way to measure them, trying to use smart and intelligence methods to solve the above issues. For more specific, I've built machine learning models to predict user behavior to increase the conversion rate, streaming and batch data pipeline to simplify data processing processes and keep providing high-quality data for teams, designed and managed data warehouses for data team, and machine learning platform for data scientists to accelerate the ML process from the lab to the production environment.

Subscribe to Receive PyData Updates