Thursday October 28 11:00 AM – Thursday October 28 12:30 PM in Workshop/Tutorial I

🦉DVC Showcase – Who Moved My Data? 🗂

Dean Pleban

Prior knowledge:
Previous knowledge expected
Git, Python

Summary

Building fully versioned & reproducible machine learning projects is important, but it's not an easy task. It can help you connect to your production pipelines, make sense of your data, and work more effectively as a team. If you want to learn how to use DVC to achieve reproducible & versioned ML projects, not just at a theoretical level, but how it actually works in reality, this talk is for you!

Description

Building fully versioned & reproducible machine learning projects is important, but not an easy task.

One tool that can help you get there is DVC. DVC is one of the most widely adopted solutions to manage data, model, and artifact versions, as well as build reproducible data pipelines. Its community has over 8000 members, and it's used by organizations all over the world.

If you always wanted to understand how DVC works, when to use it, and how it can help your data science or machine learning workflow – this talk is for you.

In this talk, I will give an overview of the data science workflow, and focus on the challenge of reproducibility and versioning. I will present DVC, an open source tool built to handle these issues in data science projects. I'll explain the simple operating principles of the tool, and the benefits gained from using it. Then, I'll pop the hood and present some of the internals of DVC, to make sure you get a good grasp of how it works. I'll discuss DVC's pros and cons, and present another open source tool called FDS, which combines Git and DVC into one, easier to use solution.