Visualising Natural Language Processing pipelines can be tricky, especially at scale. In this talk I will introduce Pynorama – a visualisation system that is written in Python, easy to setup, scalable, extensible, highly interactive and allows you to browse, analyse and understand your datasets and machine learning models.
In many real-life Natural Language Processing (NLP) settings a researcher develops a pipeline of models, with each stage dependent on the output of its predecessor. When datasets extend to tens or hundreds of millions of documents, even simple pre-processing tasks such as parsing, tokenisation, lemmatisation, and vectorisation can result in transformation pipelines of challenging scale. Common NLP tasks such as sentiment classification or topic modelling add-in multiple further stages. Visualising these pipelines and the models they represent can be very challenging.
We have been actively using and developing Pynorama at Man AHL, and are currently working on making it open-source and available to everyone.