At ING we are able to categorise thousands of financial transactions per second in real-time to help our customers get an overview of their income and their spendings. In this talk Tijl will discuss the constraints within which we have to operate, how we bootstrapped the algorithm to have something to go live with and how we are using bayesian inference and bandits to help the algorithm self-learn
As part of our personal finance management functionality, ING categorises transactions in real-time for customers that want to get insight into where they are spending their money. In this talk for anybody who would like to be inspired by a real-life, large-scale data science use case, I’ll elaborate on 1) the tools and constraints within which we are operating, 2) how we bootstrapped the algorithm, so we could go live to customers with acceptable quality, and 3) the Bayesian-inspired functionality that we created to make this categorisation algorithm self-learn from the feedback from those customers.
The system, which was developed in-house, is split into a batch part and a real-time part. The batch part was written in PySpark and uses both implicit and explicit feedback from customers to tweak the predicted category for each counter party. We use Bayesian inference to assess probabilities and bandits to provoke more varied feedback from customers. The real-time part uses the information from the batch part to categorise the transactions as they happen using Apache Flink and PMML.
In this talk I hope to inspire you to make the most of the tools you have available, and get your hands dirty with streaming analytics and to convince you that you don’t need deep learning to change the lives of your customers for the better.