In this talk we will explain how we solved the problem of classifying job titles into a job ontology with more than 5000 different classes. We do this by learning a character-based representation of job titles with a B-LSTM encoder trained as a Siamese network. You will learn about the methods in theory and how these can be implemented with the Keras deep learning library.
Learning representations of textual data is a crucial component in NLP systems. An important application is linking entities extracted from unstructured text to a knowledge base. In our use case, the entities are job titles extracted from resumes or vacancies, and the knowledge base is a hierarchical job title taxonomy. Successfully linking job titles is particularly important in our application, as it directly influences the performance of information retrieval- and data analytics solutions.
In this talk we will explain how we solved the problem of classifying job titles into a job ontology with more than 5000 different classes. We do this by learning a character-based representation of job titles with a B-LSTM encoder trained as a Siamese network. You will learn about the methods in theory and how these can be implemented with the Keras deep learning library.
We will walk you through how we constructed training examples in a domain where large-scale manual annotation is nearly impossible. We will show you how we built a framework to test invariances we would like to model in our data, such as extra words in automatically extracted phrases (e.g. "class 1 driver using own vehicle, london") and spelling variation (e.g. “C Sharp” vs “C#”). Lastly we introduce a negative sampling strategy such that the network learns to recognize subtle differences between phrases (e.g. “pipe fitter” versus “ship fitter”).