Sunday 10:50 a.m.–11:30 a.m.

Low Friction NLP with Gensim

Trent Hauck

Audience level:
Novice

Description

Gensim is fairly popular NLP library available in Python. In addition to having implementations of several popular algorithms, it has a utilities that make working with the corpus itself easier.

In this talk I'd like to give an overview of Gensim, and then two examples. One will illustrate an LDA example, then I'll show a somewhat novel use of Word2Vec to understand user preferences.

Abstract

Overview

The overview will follow the general arc of an NLP project.

  • Reading the corpus, here this is done with gensim's streaming API.
  • Transformations, often a transformation to BOW is done, and potentially something like TFIDF.
  • Training the model from the corpus.
  • Working with the result for analysis or otherwise.

Examples

  1. This will be a straight forward application: topic discovery on a corpus and then analyzing the resulting topics to look for patterns.
  2. Next I'll cover how to use Gensim's Word2Vec implementation to better understand customer preferences.

Sponsors


Become a sponsor.