Sunday 15:05–15:40 in Auditorium

Annotating data the right way

Amit Beka

Audience level:
Novice

Description

Our data quality is crucial for successful ML projects, but we often neglect the annotation process for creating it. In this talk, I will present methods, tools and best practices for managing perfect annotation projects. We will review when to choose active learning, how to handle mass-scale annotators and ways to validate your quality of work.

Abstract

We train our models using annotated data every day, however most of us have very little knowledge on how to accurately and effectively create high-quality datasets.

In this talk we will explore the different options for getting annotations and how to set up tooling and metrics (using Prodigy as an example). In addition, we will review methods for getting accurate and clean datasets, while answering these questions: - Active or passive learning? - Internal experts or mass-scale mechanical turks? - One big task or many small ones? How to break it up? - Evaluating agreement and quality of the annotations using Kappa scores

Subscribe to Receive PyData Updates

Subscribe