Friday 10:25–12:00 in LG7

Topic Modelling (and more) with NLP framework Gensim

Lev Konstantinovskiy

Audience level:
Intermediate

Description

Topic Modelling is an information retrieval technique to identify topics in a large corpus of text documents. This tutorial will guide you through the process of analysing your textual data through topic modelling - from finding and cleaning your data, pre-processing using spaCy, applying topic modelling algorithms using gensim - before moving on to more advanced textual analysis techniques.

Abstract

Topic Modelling is a great way to analyse completely unstructured textual data - and with the python NLP framework Gensim, it's very, very easy to do this. The purpose of this tutorial is to guide one through the whole process of topic modelling - right from pre-processing your raw textual data, creating your topic models, evaluating the topic models, to visualising them. Advanced topic modelling techniques will also be covered in this tutorial, such as Dynamic Topic Modelling, Topic Coherence, Document Word Coloring, and LSI/HDP.

The python packages used during the tutorial will be spaCy (for pre-processing), gensim (for topic modelling), and pyLDAvis (for visualisation). The interface for the tutorial will be an Jupyter notebook.

The takeaway from the tutorial would be the participants ability to get their hands dirty with analysing their own textual data, through the entire lifecycle of cleaning raw data to visualising topics.

Subscribe to Receive PyData Updates

Tickets

Get Now