Thursday 13:10–13:40 in Main Track

Reproducible Machine Learning

Mateusz Opala

Audience level:
Intermediate

Description

Reproducibility is a cornerstone of scientific methods. Especially in production Machine Learning it's crucial to ensure that hidden source of randomness is not a real reason for a model performance improvement. In my talk I will elaborate on importance of reproducibility and show how we build reproducible machine learning pipelines at Netguru.

Abstract

Reproducibility is a cornerstone of scientific methods. Especially in production Machine Learning it's crucial to ensure that hidden source of randomness is not a real reason for a model performance improvement. Although, reproducibility in building machine learning papers seems to be must-have, it's still not a standard.

Outline of talk:

  1. Definitions:
    • reproducibility
    • replicability
    • generalisability
  2. Motivation for achieving reproducibility
  3. Full reproducibility == Continuous Delivery for ML
  4. Changes in ML development process
    • code
    • data
    • models
  5. How we managing change in ML development process?
  6. Data versioning
    • Quilt Data
  7. Experiments management
    • MLFlow / Polyaxon
  8. Summary

Subscribe to Receive PyData Updates

Subscribe