Thursday 11:00 AM–12:30 PM in Central Park West 6501 (6th fl)

Text Analysis with SpaCy and Scikit-Learn

Jonathan Reeve

Audience level:
Experienced

Description

This tutorial is an introduction to SpaCy, a new library for natural language processing written in Cython, and the NLP capabilities of Scikit-Learn, a machine learning library, intended for those with experience working with text as data.

Abstract

This tutorial is an introduction to SpaCy, a new library for natural language processing written in Cython, and an introduction to the NLP capabilities of Scikit-Learn, a machine learning library. Using SpaCy, we will cover part-of-speech tagging, dispersion plot analyses, dependency parsing, and word embeddings (word and document vectorization). Using Scikit-Learn, we will perform dimensionality reduction and other tasks. We will also visualize sentence diagrams using a custom library built for this tutorial called Sent2Tree. We will analyze texts such as Jane Austen's Pride and Prejudice and the screenplay of Monty Python and the Holy Grail, in order to answer questions like:

These techniques may also be applied more generally to any text.

Subscribe to Receive PyData Updates

Subscribe