Friday 12:15–13:00 in Hall 1

One in a billion: finding matching images in very large corpora

Ryan Henderson

Audience level:
Intermediate

Description

The goal was not only to support high write volumes ( >10k/s) but also to support fast lookup of similar images (around 1-2s for over 1B images). Though similar paid services and free image hashing libraries exist, this may be the first complete free open-source solution. Available at: https://github.com/ascribe/image-match

Abstract

image-match started as an internal project. We needed a way, given some target image, to find similar images downloaded by our web-crawler (think Tineye).

So not only did we need to support fast, accurate lookup for millions or even billions of images, we also needed to facilitate very high volume insertion -- around 10k images per second.

In my talk, I will cover:

  • The Problem: why is finding similar images hard?
  • Algorithm: based on this paper
  • Performance: but does it scale?
  • Alternatives