Monday 17:10–17:40 in Track 3

What ad is this?

Adam Witkowski

Audience level:
Experienced

Description

Ads on the web often consist of a picture with some text. Based on this information, can you tell what exactly is advertised? In this talk I will describe a system that automatically categorizes ads seen on the web. I will talk about potential approaches to this problem and describe in detail the chosen solution. This system was created as a joint project between Gemius and MIM Solutions.

Abstract

I will talk about a solution to the following problem: we are given an advertisement from the web that consists of a picture and some text. We also know the website the ad links to. We need to determine what brand is advertised (from a list of known brands). A brand can be, for example:

Such task is quite easy for humans but hiring humans is quite expensive. Therefore, we looked for a way to automate this task. From machine learning point of view, this is a multiclass classification problem. It is complicated by the fact that there are a lot of classes and the input is a combination of text and image.

There are many possible approaches to such problem, among them:

I will shortly discuss the pros and cons of those approaches and tell you why none of them were good enough for us. Then I will describe the chosen solution.

Subscribe to Receive PyData Updates

Subscribe