Sunday 15:00–15:45 in LG7

Finding the Right Articles - A Supervised Approach to Search

Yasen Kiprov, Pepa Gencheva

Audience level:


Finding the knowledge base articles which hold answers to a customer's question can be hard. This talk is about our approach and how it grew with the amout of training data we collected. We'll describe several similarity measures, then move to manually-configurable ensemble search and eventually present a supervised model which picks the right answers from a set of search suggestions.


This talk describes a supervised algorithm for matching hosting-related questions to KB articles and tutorials from the Siteground website.

-- Part One: Beating Google --

Starting without any training data, we'll show how different similarity metrics and document representations can act as a decent search engine. We will describe how we built word2vec and LDA models, what Word Mover's Distance is and how we tuned it to match questions to KB articles.

We will then go into detail about mixing different similarity approaches to build a search engine which performs better than Google's custom site search on our dataset. We'll also state briefly how we applied gearman to run searches in parallel.

-- Part Two: Knowing The Right Answer --

A shortcoming of any search engine is that it lacks the knowledge of right and wrong answer. In order not to bother clients with very similar but not helpful articles, we created a classifier to tell the difference. We'll describe the specifics of building a supervised model for our task, along with it's features and instances.