In this talk we will explain how we solved Kaggle's Uncertain Bags competition. We will show you how to specify it as an optimisation problem and solve it using a combination of Google's Operations Research Tools and the PyMC Bayesian modelling suite. This talk assumes that you have a basic knowledge about Linear Programming and Monte Carlo optimisation.
Every year around Christmas, Kaggle hosts a competition at the crossroads of data science and optimisation. Last year's challenge[1] was about helping Santa to fill 1000 bags with 9 types of gifts, packing those bags as full as possible, but avoid overfilling them. Added to that, we don't know the weight of any specific gift, but are given only the distribution from which a gift was sampled.
We will show that we can state this as a multiple knapsack problem (and explain you what that is) and how it can be solved with an Integer Linear Programming solver. To get around the problem of only having distributions of weights, we demonstrate that you can address this by sampling the expected weight of potential bags (combinations of gift types) while taking into account the possibility of tear. We will also explain a bit about Google's Operations Research tools and how it makes our job surprisingly easy.
Using this approach, we can get at the best combinations of gifts. Yet we can still do better! By trying to learn about individual gifts in the test set, we can use some of our submissions to Kaggle to learn more about the actual solution. This is an exploration/exploitation situation (the number of submissions is limited) and we discuss the pros and cons of a few learning strategies (random, local search) as well as the use of PyMC for this purpose.
Finally, we will also discuss the perfect solution to this problem, which turns out the be a delightful hack.