The correct classification of customer service calls/chat transcription is crucial to identify bottlenecks in customer facing business processes and in improving customers satisfaction. In this talk we discuss the technical and even harder business challenges we had to overcome at KPN in applying multiple Text Analytics techniques in order to achieve a workable classification accuracy.
Even the fanciest Convolutional Neural Networks achieves poor accuracy in classifying text when the labels in the training data set are not 100% correct. What to do then if increasing the quality of the labels is either too expensive or just impossible given the very large, often overlapping, amount of classes to predict?
We will walk through our journey how we: - built CNN for text classification using Tensorflow (which outperformed basic techniques like Naive Bayes with TF-IDF features for our task) - applied few unsupervised learning approaches (LDA, Doc2Vec+K-means) using gensim and compared these approaches by mapping created categories to known business notions - and finally introduced a domain specific Ontology/Taxonomy framework where a smart mix of business expert domain knowledge and trained word embeddings enabled us to increase classification accuracy