Sunday 9:45 AM–11:15 AM in C02

Create a sense2vec model using Gensim and Spacy from scraped news data and integrate it with Flask

Tanu Mittal., Abhishek Kapoor

Audience level:
Intermediate

Description

This workshop will make users understand how to model multiple embeddings (senses) for a word using NLP techniques.

Abstract

Workshop - How to create a sense2vec model using Gensim and Spcacy from scraped news data and integrate it with Flask Frontend

Sense2vec - Neural word representations have proven useful in Natural Language Processing (NLP) tasks due to their ability to efficiently model complex semantic and syntactic word relationships. However, most techniques model only one representation per word, despite the fact that a single word can have multiple meanings or "senses". Some techniques model words by using multiple vectors that are clustered based on context. However, recent neural approaches rarely focus on the application to a consuming NLP algorithm. Furthermore, the training process of recent word-sense models is expensive relative to single-sense embedding processes. Sense2vec paper presents a novel approach which addresses these concerns by modeling multiple embeddings for each word based on supervised disambiguation, which provides a fast and accurate way for a consuming NLP model to select a sense-disambiguated embedding.

Source - Cornell University Library

Word2vec - Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space.

Source - Wikipedia

Flask - Flask is a microframework for Python based on Werkzeug, Jinja 2 and good intentions. And before you ask: It's BSD licensed!

Source - Flask

Workshop Structure Attendees will be provided the corpus of news scraped from web containing approx 500,000 articles. Tutorial on how can we use Spacy to do POS tagging and and use Noun chunks provided by it to feed to Gensim Word2vec. Tutorial on how to use Gensim to create a Word2vec model. Tutorial on how to convert Word2vec model to Sense2vec model. Writing REST service in Flask to get the similarity results using Sense2vec. Integrate REST service with the front - end Slides will be added soon.

Requirements Laptop with at least 8 GB of RAM. Python 3 environment (Virtual environment could be forked from the repository provided in the resources.)

Resources https://explosion.ai/blog/sense2vec-with-spacy https://rare-technologies.com/word2vec-tutorial/ https://arxiv.org/abs/1511.06388 Virtual environment link will be added soon.

About Speakers.

Tanu Mittal (Sr. Software Engineer) https://www.linkedin.com/in/tanu-mittal-16b12364/

Abhishek Kapoor (Software Engineer) https://www.linkedin.com/in/abhishek-kapoor-4b7b9295

Subscribe to Receive PyData Updates

Subscribe

Tickets

Get Now