Have you ever heard about Machine Learning versioning solutions? Have you ever tried one of them? And what about automation? Come with us and learn how to easily build versionable pipelines! This tutorial explain through small exercises how to setup a project using DVC and MLV-tools.
You're a data scientist. You have a bunch of analyses you performed in Jupyter Notebooks, but anything older than 2 months is totally useless because it's never working right when you open the notebook again. Also, you cannot remember the dropout rate on the second to last layer of this convolutional neural network which gave really great results 2 weeks ago and that you now want to deploy into production. Does that ring a bell?
You're a software engineer in a data science team. You can’t live without Git. Reviews on readable files, tests, code analysis, CI, used to belong to your daily basis. You were thinking of Jupyter Notebooks only as a demo tool. You need reproducibility for every step of your work even if you lose a server. And last but not least, you want to be able to deliver to production something usable by anyone.
This tutorial explains and shows how to use MLV-tools to set up a development environment and to be able to deliver the project avoiding frustrations due to teams segregation or point of view.
There is no magical solution, but compromises can be found. MLV-tools helps to:
Global goal: be able to easily set up your own project using MLV-tools
Attendees will be guide step by step to experiment on their own computer.
Goal: expose versioning, automation and reproducibility issues with Machine Learning projects.
Goal: understand how to handle code, hyperparameters and data versioning using Git an DVC pipeline.
Goal: easily use DVC on a Machine Learning project with MLV-tools.
Goal: see how the process fits daily basis cases