PyData London 2018 - Presentation: Visualising NLP pipelines with Pynorama

Visualising Natural Language Processing pipelines can be tricky, especially at scale. In this talk I will introduce Pynorama – a visualisation system that is written in Python, easy to setup, scalable, extensible, highly interactive and allows you to browse, analyse and understand your datasets and machine learning models.

In many real-life Natural Language Processing (NLP) settings a researcher develops a pipeline of models, with each stage dependent on the output of its predecessor. When datasets extend to tens or hundreds of millions of documents, even simple pre-processing tasks such as parsing, tokenisation, lemmatisation, and vectorisation can result in transformation pipelines of challenging scale. Common NLP tasks such as sentiment classification or topic modelling add-in multiple further stages. Visualising these pipelines and the models they represent can be very challenging.

In this talk I will introduce Pynorama – a visualisation system to address these challenges. Pynorama is easy to setup, scalable, extensible, highly interactive and allows you to browse, analyse and understand your datasets and machine learning models. Pynorama is developed in Python and JavaScript. It allows you to plug in datasets and custom visualisations and is designed to fit with the workflows of data scientists and machine learning researchers. I will cover Pynorama’s core features: browsing datasets, constructing NLP pipeline graphs, manipulating data (searching, sorting, filtering) and navigating through the different pipeline stages.

We have been actively using and developing Pynorama at Man AHL, and are currently working on making it open-source and available to everyone.

Sunday 13:30–14:15 in Tower Suite 2

Visualising NLP pipelines with Pynorama

Slavi Marinov

Description

Abstract

Subscribe to Receive PyData Updates