PyData New York City 2019 - Presentation: Clean Machine Learning Code: Practical Software Engineering Principles for ML Craftsmanship

As a community, our work in machine learning inherently depends on external tools and frameworks. However, we have no control over the development and maintenance of these external dependencies. The primary problem is that as a machine learning pipeline become intertwined with a specific ML framework, the harder and more expensive it is to change. This leads ML teams to accumulate technical debt, with serious symptoms like entanglement, hidden feedback loops, undeclared consumers, and pipeline jungles.

However, from a business perspective, Tensorflow, PyTorch, and Scikit-Learn are details. MySQL, EMR, and Hive are details. Airflow, KubeFlow, and Dask are also details. There must be a way to decouple our ML applications from these frameworks and tools. This talk aims to cover the most important Clean Code design principles that can help evolve our ML engineering craftsmanship.

We will cover the following goals of a clean machine learning architecture:

Loose Coupling
High Cohesion
Change is Local
Make It Easy to Remove
Mind Sized Components

To achieve those goals we will dive into the clean code design principles, and explain how they relate to common ML tasks and components:

Single Responsibility Principle (SRP)
Open-Closed Principle (OCP)
Interface Segregation Principle (ISP)
Dependency Inversion Principle (DIP)

It is well accepted that a good architecture maximizes the number of decisions not made. Creating good architecture requires extensive experience in the target domain. However, as of 2019, 40% of data scientists in the USA have less than 5 years of experience. This inexperienced workforce does not make these challenges any easier. At the same time, we are experiencing a boom in ML development and usage. This is similar to previous software engineering expansions in the 2000s. The current expansion manifests itself with a menagerie of constructs, frameworks, and workflows. This creates a multitude of integration challenges that remind us of good old software engineering problems. Some challenges of ML engineering are indeed new. However, the majority of the software engineering concerns have a historical smell. Going back to the basics of good software engineering can help with today’s ML engineering problems.

This talk will help the audience apply the principles of clean machine learning code, and escape the vicious cycle of ML technical debt.

Tuesday 11:40 AM–12:20 PM in Central Park West (6501)

Clean Machine Learning Code: Practical Software Engineering Principles for ML Craftsmanship

Moussa Taifi Ph.D.

Description

Abstract

Subscribe to Receive PyData Updates