Tuesday 3:05 PM–4:00 PM in Music Box 5411/Winter Garden 5412 (5th fl)

Reverse image search engines using out-of-the-box machine learning libraries

Leon Yin, yvan

Audience level:
Intermediate

Description

We propose a simple, robust, and scalable reverse image search engine that leverages convolutional features from Keras' pre-trained neural networks and the distance metric from Scikit-Learn's K-Nearest Neighbors. We show example queries using data scraped from Google images, and dive deeper in how we use the search engine to track the proliferation of memes from the dark web.

Abstract

We use Keras' pre-trained deep neural networks to transform images into matrices of convolutional features. Convolutional features are reshaped in NumPy, and fed into Scikit-Learn's K-Nearest Neighbors (KNN).

This technique leverages model architecture and the diversity of training data used in ImageNet to extract differentiable features from images. We walk through the performance of several popular model architectures, which provides interpretability into features that each model looks for.

We also discuss the the trade-off between the different algorithms used to calculate distance using KNN. We use multiprocessing to distribute the KNN algorithm among different subsets of data, allowing the search engine to scale.

Subscribe to Receive PyData Updates

Subscribe