Data Scientists strive to experiment as quickly, easily, and effectively as possible. The difference between a 30 second local code change and a 10 minute CI/CD pipeline can cost days of productivity through iterations. In this talk, Rappi Data Engineer Gonzalo Diaz and Apache Airflow Committer Daniel Imberman describe a process for building a simple yet scalable data pipeline and massive scalable
In this tutorial, a bounded ecosystem with Jupyter and Airflow will be presented to the participants. Briefly explaining the key components and how they interact. This will allow them to structure their experiments and prototypes, allowing them to generate synergy in their teams that will enable the rapid delivery of value in their companies through standard open source tools in the industry. As part of the tutorial, aspects of scalability will be discussed using Kubernetes and Astronomer.io.
The tutorial is aimed to data engineers, data scientists and anyone involved in the data area that handles low to intermediate knowledge in programming.
A complete example will be available in the public repository.