Imagine you need to choose ten articles out of hundreds in a way that maximizes your profit. It's not as easy as it seems. In this talk, we will explain how we prepare recommendations on the onet.pl home page for millions of users with the use of a multi-armed bandit algorithm.
Multi-armed bandits are a powerful solution for a diversity of optimization problems that demand a balance between using existing knowledge about item performance and acquiring new one. That's why we would like to focus on the intuition behind the multi-armed bandit approach and its application in recommender systems on the example of onet.pl home page. Also, we will introduce E-greedy, UCB and Thompson Sampling bandits, discuss their pros and cons and show how to tune them in a simulated environment.