Beyond the usual concerns in software development, machine learning development comes with additional challenges. These include trying multiple algorithms and parameters to get the best results, track these runs for reproducibility, and moving the model to diverse deployment environments. This tutorial provides a hands-on experience of managing the complete machine learning lifecycle using MLflow.
In theory, the crux of machine learning (ML) development lies with data collection, model creation, model training, and deployment. In reality, machine learning projects are not so straightforward. They are a cycle iterating between improving the data, model, and evaluation that is never really finished. Unlike in traditional software development, ML developers experiment with multiple algorithms, tools, and parameters to optimize performance, and they need to track these experiments to reproduce work. Furthermore, developers need to use many distinct systems to productionize models.
In this tutorial, we introduce MLflow, an open-source platform that aims at simplifying the entire ML lifecycle where we can use any ML library and development tool of our choice to reliably build and share ML applications. MLflow offers simple abstractions through lightweight APIs to package reproducible projects, track results, and encapsulate models that are compatible with existing tools, thereby, accelerating ML lifecycle of any size.
With the help of an example, we will show how using MLflow can ease bookkeeping of experiment runs and results across frameworks, quickly reproducing runs on any platform (cloud or local execution), and productionizing models on diverse deployment tools.
At the end of this tutorial, you will be familiar with –
The purpose of the tutorial is to introduce the audience to MLflow and give a taste of the ML development lifecycle. It is intended at providing a breadth than depth survey of MLflow platform, and we leave the audience to experiment with it further through a takeaway exercise.