Friday 9:00–10:30 in Tower Suite 1

Automatic tagging of short texts with scikit-learn and NLTK

Gilbert Fran├žois Duivesteijn

Audience level:


Automatic tagging of short text messages with NLTK and scikit-learn, applicable to all kind of short messages, like email subjects, tweets, or as demonstrated in this hands-on tutorial, Slack messages. The tutorial will show step by step how to do automated tagging of short texts, enabling the analyst to structure the data and get meaningful statistics.



The talk is about automatic tagging of short texts with a set of predefined tags. The method will be illustrated with short messages from a Slack channel, but the method is applicable to all kind of short messages, e.g. email subjects, tweets, sms messages and more.


The method is as follows:

It will be demonstrated in a Jupyter notebook file, available for everybody to play with during and after the hands-on talk. If time permits, an application of the trained model will be shown in a online dashboard, build with Flask.


The talk is intended for entry level audience with basic knowledge of python and applied machine learning.


The message and take-a-way will be that simple things can be done simple. With just a few lines of code you can have a great starting point with reasonable performance. The method itself might not be state of the art with top-grade error performance. What makes the method so interesting is that the method enables you to get first results in less than a day. Then, refine, iterate and make it better, if desired.

Subscribe to Receive PyData Updates