Saturday 14:15–15:00 in Hörsaal 3

Building new NLP solutions with spaCy and Prodigy

Matthew Honnibal

Audience level:
Novice

Description

In this talk, I will discuss how to address some of the most likely causes of failure for new Natural Language Processing (NLP) projects. My main recommendation is to take an iterative approach: don't assume you know what your pipeline should look like, let alone your annotation schemes or model architectures.

Abstract

Commercial machine learning projects are currently like start-ups: many projects fail, but some are extremely successful, justifying the total investment. While some people will tell you to "embrace failure", I say failure sucks --- so what can we do to fight it? In this talk, I will discuss how to address some of the most likely causes of failure for new Natural Language Processing (NLP) projects. My main recommendation is to take an iterative approach: don't assume you know what your pipeline should look like, let alone your annotation schemes or model architectures. I will also discuss a few tips for figuring out what's likely to work, along with a few common mistakes. To keep the advice well-grounded, I will refer specifically to our open-source library spaCy, and our commercial annotation tool Prodigy.

Subscribe to Receive PyData Updates