PyData DC 2016 | Presentation: Building Continuous Learning Systems

Saturday 4:45 PM–5:30 PM in Room #370B/C (3rd Floor)

Building Continuous Learning Systems

Anuj Gupta

Audience level:: Intermediate

Description

In this talk we explore how to build Machine Learning Systems that can that can learn "continuously" from their mistakes (feedback loop) and adapt to an evolving data distribution.

Abstract

Problem Statment

In many Machine Learning (ML) problems, there is an implicit assumption - training and test data come from the same stationary distribution. Any ML model developed and thereafter deployed in a production environment might make mistakes either because (1) It did not learn the concept correctly, or (2) the data distribution is non-stationary (concept drift; today more data is organized in the form of data streams rather than static databases, and it is unrealistic to expect that data distributions stay stationary over a long period of time).

Won't it be great to have models that can learn “continuously” from their mistakes (feedback loop) and adapt to an evolving data distribution? While there is a plethora of literature on building models with higher accuracies on test data, this isnt the case when it comes to building ML systems that learn from their mistakes and adapt on the fly. Such system are often called “Learning Machines”. Currently, the standard way to handle feedback and drift is: monitor the performance of the model; If the performance goes below acceptable levels, replace the current model in production with a newly trainined model on the collected feedback & recent data.

In this talk we describe our efforts on building ML system that can learn continuously and can adapt on-the-fly to an evolving data distribution. The focus of this talk is on the last leg of the loop, as highlighted in the figure below:

Focus of this work

Applied domains

Introduce non-stationary distributions with examples (we'll observe the results of misclassification rate on our data over a long period of time.)
Identify shift in data distribution & labels (For example, we will see how customer preferences (labels) changes over time and how the fallacy of believing that our training sets represent the population, when in reality it is usally a random sample of the current population.)
Idenitify noise in data
Temporal nature of data
Relevant domains that can be addressed with similar learning techniques: Social media, Monitoring and Anamoly detection, Predictions and recommendations

Towards Solution:

Modeling the problem:

The problem of instantaneous incorporation of feedback can be modeled as Online Learning: Model sees a data point (x). Makes a prediction y'. The environment reveals the true label y. Model is correct if (y = y') and incorrect if (y != y'). In case the model is wrong, the data point (x,y) is sent to the model as “feedback”.

Ideally you would want the model:

To incorporate this in its learnings i.e. model updates its parameters/weights such that if this data point (x,y) is (ever) again presented to the model, model will output correct label y
Does not impact the accuracy of the model on other data points i.e. if prior to incorporating this feedback for all the points where model was predicting correctly, model must continue to do so after the incorporation. In short, in trying to acquire new learning, your model should not forget old (correct) learnings.

Divide and Conquer

We borrow the ideas from algorithms (Perceptron, Crammer's PA-II variants) suitable for online settings to build fast learning per-user statistical model with dearth of data leading to temporal retention - we call this Local model. In addition, we perform online passive aggressive updates to combat high degree of noise in the training set. Each message is used to update the model. We use the aggressive & passive parameters to tweak the learning rates on the local model. Further, we use Deep Neural Nets to build model where a glut of data is available leading to persistent retention - we call this Global model.

We use cassandra to store-serve models which acts as the distributed file system with a database.

We build on the work of Drift Detection (Gama et.al. 2004, Garc´ia et.al. 2006 and Minku et.al. 2012) to refine the training window.

Ensemble Manager

While both cost-sensitive learning and online learning have been studied extensively, the effort in simultaneously dealing with these two issues is limited. Our key idea is based on the fusion of online ensemble algorithms and the state of the art batch mode cost-sensitive bagging/boosting algorithms. Within this framework, two separately developed research areas - cost-sensitive learning and online learning algorithms are bridged together.

Summary

While building models with better accuracies is critical (and thats what Machine Learning addresses); from the view point of a practitioner in industry its important to realise that Machine Learning is only a part of End-to-End cycle of building ML systems. An important piece of this endeavour is to build machines that can learn continuously - i.e. to learn from mistakes and adapt to evolving data. In this talk we elaborate our attempts and results towards building such machines. To the best of our knowledge though such systems have been built (google, fb etc), there isn't much literature/resources around them. We hope our talk will help and inspire the community to make serious attempts to bring better science to this art.