Bias for and against genders, races, and more exist in popular NLP datasets like GloVe and word2vec. This talk will discuss how to detect and remove prejudice in text datasets and derived word embeddings along with the impacts of ignoring them.
Word embeddings have become a widespread component of machine learning algorithms and deep learning architectures as a more compact way to represent text data. Unwanted biases in popular text corpora are being amplified in word embeddings. Left unaddressed, these prejudices present a high risk for naively creating unfair and harmful data products and services.
Metrics to measure prejudice in text data will be covered as well as ways to remove the unwanted bias from trained word embeddings. Current fixes work better on some types of prejudice than others.
This talk is for anyone using word embeddings to build products like chat bots, sentiment, neural translation, IR and search. This includes data scientists, machine learning engineers, and anyone responsible for the products they build.
At the end of this talk you will know: