Manually running scripts to extract, transform and load data is a trade-off with time, is tedious and cumbersome. The process of building a data pipeline can be automated. Scripts to extract data can be scheduled using crontab. However, using crontab has its own drawbacks. One major challenge is monitoring. Airflow is a platform to programmatically author, schedule and monitor workflows.
Today, we are moving towards machine learning. Making predictions, finding out insights based on data. For the same purpose, the initial step is to have efficient processes in place which help us in collecting data from various different data sources. Using traditional ways to collect data is tedious and cumbersome. Manually running scripts to extract, transform and load data is a trade-off with time.
To make the process efficient. The data pipeline can be automated. Scripts to extract data can be auto-scheduled using crontab. However, using crontab has its own drawbacks. One major challenge comes in monitoring. This is where an open source tool built by AirBnB engineering team - Apache airflow helps. Airflow is a platform to programmatically author, schedule and monitor workflows.
The talk aims at introducing the attendees to.