Thursday 10:00 AM–10:45 AM in Track 3 Room

Building Named Entity Recognition Models Efficiently using NERDS

Sujit Pal

Audience level:
Intermediate

Description

Named Entity Recognition (NER) is foundational for many downstream NLP tasks. The Open Source NERDS toolkit provides algorithms that can be used to quickly build and evaluate NER models from labeled data such as IOB. New algorithms can be added with minimal effort. This presentation will demonstrate how to create and evaluate new NER models using NERDS, as well as add new NER algorithms to it.

Abstract

Named Entity Recognition (NER) is foundational for many downstream NLP tasks such as Information Retrieval, Relation Extraction, Question Answering, and Knowledge Base Construction. While many high-quality pre-trained NER models exist, they usually cover a small subset of popular entities such as people, organizations, and locations. But what if we need to recognize domain specific entities such as proteins, chemical names, diseases, etc? The Open Source Named Entity Recognition for Data Scientists (NERDS) toolkit, from the Elsevier Data Science team, was built to address this need.

NERDS aims to speed up development and evaluation of NER models by providing a set of NER algorithms that are callable through the familiar scikit-learn style API. The uniform interface allows reuse of code for data ingestion and evaluation, resulting in cleaner and more maintainable NER pipelines. In addition, customizing NERDS by adding new and more advanced NER models is also very easy, just a matter of implementing a standard NER Model class.

Our presentation will describe the main features of NERDS, then walk through a demonstration of developing and evaluating NER models that recognize biomedical entities. We will then describe a Neural Network based NER algorithm (a Bi-LSTM seq2seq model written in Pytorch) that we will then integrate into the NERDS NER pipeline.

We believe NERDS addresses a real need for building domain specific NER models quickly and efficiently. NER is an active field of research, and the hope is that this presentation will spark interest and contributions of new NER algorithms and Data Adapters from the community that can in turn help to move the field forward.

Subscribe to Receive PyData Updates