In this talk we explore how to build Machine Learning Systems that can that can learn "continuously" from their mistakes (feedback loop) and adapt to an evolving data distribution.
In many Machine Learning (ML) problems, there is an implicit assumption - training and test data come from the same stationary distribution. Any ML model developed and thereafter deployed in a production environment might make mistakes either because (1) It did not learn the concept correctly, or (2) the data distribution is non-stationary (concept drift; today more data is organized in the form of data streams rather than static databases, and it is unrealistic to expect that data distributions stay stationary over a long period of time).
Won't it be great to have models that can learn “continuously” from their mistakes (feedback loop) and adapt to an evolving data distribution? While there is a plethora of literature on building models with higher accuracies on test data, this isnt the case when it comes to building ML systems that learn from their mistakes and adapt on the fly. Such system are often called “Learning Machines”. Currently, the standard way to handle feedback and drift is: monitor the performance of the model; If the performance goes below acceptable levels, replace the current model in production with a newly trainined model on the collected feedback & recent data.
In this talk we describe our efforts on building ML system that can learn continuously and can adapt on-the-fly to an evolving data distribution. The focus of this talk is on the last leg of the loop, as highlighted in the figure below:
The problem of instantaneous incorporation of feedback can be modeled as Online Learning: Model sees a data point (x). Makes a prediction y'. The environment reveals the true label y. Model is correct if (y = y') and incorrect if (y != y'). In case the model is wrong, the data point (x,y) is sent to the model as “feedback”.
Ideally you would want the model:
We borrow the ideas from algorithms (Perceptron, Crammer's PA-II variants) suitable for online settings to build fast learning per-user statistical model with dearth of data leading to temporal retention - we call this Local model. In addition, we perform online passive aggressive updates to combat high degree of noise in the training set. Each message is used to update the model. We use the aggressive & passive parameters to tweak the learning rates on the local model. Further, we use Deep Neural Nets to build model where a glut of data is available leading to persistent retention - we call this Global model.
We use cassandra to store-serve models which acts as the distributed file system with a database.
We build on the work of Drift Detection (Gama et.al. 2004, Garc´ia et.al. 2006 and Minku et.al. 2012) to refine the training window.
While both cost-sensitive learning and online learning have been studied extensively, the effort in simultaneously dealing with these two issues is limited. Our key idea is based on the fusion of online ensemble algorithms and the state of the art batch mode cost-sensitive bagging/boosting algorithms. Within this framework, two separately developed research areas - cost-sensitive learning and online learning algorithms are bridged together.
While building models with better accuracies is critical (and thats what Machine Learning addresses); from the view point of a practitioner in industry its important to realise that Machine Learning is only a part of End-to-End cycle of building ML systems. An important piece of this endeavour is to build machines that can learn continuously - i.e. to learn from mistakes and adapt to evolving data. In this talk we elaborate our attempts and results towards building such machines. To the best of our knowledge though such systems have been built (google, fb etc), there isn't much literature/resources around them. We hope our talk will help and inspire the community to make serious attempts to bring better science to this art.