Friday 15:00–15:30 in Track 2

Exploring word2vec vector space

Julia BaziƄska

Audience level:
Novice

Description

Word2vec is a model of multi-dimensional vector representation of words. Similarity in the vector values often accompanies a semantic relation between words. But exploring the vector space further, we can find more interesting and surprising relations. I will shed some light on the mathematical meaning of the word vectors using an interactive visualization.

Abstract

Word2vec is a model of multi-dimensional vector representation of words. Exploring the relations in vector space one can find that it surprisingly well preserves semantic analogies between words. In the talk I will use my interactive visualization with pre-trained vectors from GloVe to illustrate the examples and relations.

Dot product of normalized vectors often indicates whether two words tend to co-occur in similar contexts. That can be used to find synonyms or antonyms. A solution of a semantic riddle like "X is to Y as A is to...?" is probably the word vector closest to A + Y - X. Another interesting aspect is vector projection on a word difference axis. It allows us to extract a specific aspect of a vector, such as gender (e. g. projecting names on a he-she axis) or job prestige (e. g. jobs on a rich-poor axis).

Subscribe to Receive PyData Updates

Subscribe

Tickets

Get Now