Wednesday Oct. 7, 2020, 3 p.m.–Oct. 7, 2020, 3:30 p.m. in Online

Productionizing an unsupervised machine learning model to understand customer feedback

Nikki van Ommeren, Maike Fischer

Audience level:
Intermediate

Description

Have you ever had to read through over 5,000 open-text feedback responses a month?

Our colleagues at ING do this all the time, spending multiple hours each week. This is why we developed an unsupervised machine learning model that clusters customer feedback with similar meaning using open-source Python libraries.

Abstract

In this talk you learn: • How to use natural language processing (NLP) techniques to create numeric representations from text data. • How to cluster unstructured text data with similar meaning. • About real life challenges when bringing an unsupervised machine learning to production in the corporate environment. - Yes, this model actually made it to production;)

So, whether you are interested in machine learning models and algorithms or software engineering and productionizing: this presentation will be of relevance to you! We assume you have basic machine learning knowledge.

The talk kicks-off by giving the audience some background information on the business use case. We emphasize on the requirement for the model to be language agnostic and scalable to different consumer markets, because the model is used globally. Additionally, the stakeholders do not have the capacity for frequent retraining, so the model needed to be ‘low maintenance’. Given all these requirements we continue with a deep dive into model development. We explain the different techniques we considered for text representation (such as BERT, Doc2Vec and TF-IDF) as well as clustering models (such as DBSCAN, HBSCAN, K-Means). In that regard, we show how we developed our own evaluation framework on assessing the performance of the different models. Finally we conclude the talk by explaining how we productionized this model on traditional corporate JVM infrastructure by building a Python API in a Docker container.

Subscribe to Receive PyData Updates

Subscribe