Tuesday 11:20 AM–12:00 PM in Central Park East 6501a (6th fl)

Learning in Cycles: Implementing Sustainable Machine Learning Models in Production

Andrew Therriault

Audience level:
Intermediate

Description

Sustainable machine learning models are built for cyclical applications which generate their own training data---think ad targeting, recommendation engines, compliance audits, or fundraising. We’ll discuss practical approaches and considerations for data scientists, analysts, and engineers working on these kinds of models in the wild.

Abstract

Machine learning textbooks tend to focus too narrowly on specific algorithms or code without looking at the bigger picture. One key real-world application that's rarely covered: models which are regularly updated with new data resulting from earlier predictions. These models are built for cyclical applications which generate their own training data---think ad targeting, recommendation engines, compliance audits, or fundraising. Done poorly, repeated models can amplify the errors and biases of their initial versions. But when done right, they can learn from those mistakes over time, and employ the results of previous versions as new training data to keep the model fresh and productive over the course of months or years of applied use. With examples from my own work in the political, nonprofit, and civic data science fields, this talk will introduce a framework for designing "sustainable" machine learning models that get better over time. We’ll discuss practical approaches and considerations for data scientists, analysts, and engineers working on these kinds of models in the wild.

Subscribe to Receive PyData Updates

Subscribe