Sunday 16:00–16:45 in LG7

Dimension Reduction and Extracting Topics - A Gentle Introduction

Tariq Rashid

Audience level:
Novice

Description

Text mining has many powerful methods for unlocking insights into the messy, ambiguous, but interesting text created by people.

Singular value decomposition (SVD) is a useful method for reducing the many dimensions of text data, and distill out key themes in that text - called topic modelling or latent semantic analysis.

This talk for beginners will gently explain SVD and how to use it.

Abstract

Text mining and natural language processing are hugely powerful fields that can unlock insights into the vast amounts of human knowledge, creativity and drivel (!) for automated computing. Examples include the fun of highlighting trends in internet chatter through to more serious analysis of finding patterns and links in leaked data sets of public interest.

One key tool is to reduce the many dimensions of text data, and distill out the key themes in that text. People call this topic modelling, latent semantic analysis, and a few other names too. The powerful method at the heart of this is called singular value decomposition (SVD).

This talk will gently introduce singular valued decomposition (SVD), explaining the mathematics in an accessible manner, and demonstrate how it can be used, using the Chilcot Iraq Report as an example dataset.

Example code, notebooks and data sets are public on GitHub, and there is a blog for more discussion of this, and other text mining ideas http://makeyourowntextminingtoolkit.blogspot.co.uk