PyData London 2018 - Presentation: Sentence embeddings for automated factchecking

There are a lot of models for individual word embeddings but few that encode the meaning of the whole sentence. i will introduce InferSent embedding for deciding if one sentence infers another. I will apply it to detecting factual claim sentence in text. It's the first step in automating a fact-checking process at Full Fact as only 15% of political TV subtitles contain verfiable claims.

This talk is an update on Full Fact's keynote at last year's PyData London. I will describe the steps involved in producing a factcheck and which of them can be automated. Transfer learning technique called InferSent is really useful in solving this problem.

The first step in automating a factchecking workflow is to extract the sentences containing claims from the rest of the text. The percentage of verifiable claims among all the spoken sentences varies greatly between political programmes - from 32% during Parliamentary Prime Minister’s questions to 8% in political interviews in The Sunday Politics show. Being able to detect them efficiently saves a lot of time for a fact-checker as it filters out most of the text.

There are a lot of models for encoding the meaning of a word in a word embedding but very few models for sentences. Popular solutions are naive averaging of word embeddings and doc2vec.

I would like to discuss a new neural sentence embedding InferSent from Facebook. https://github.com/facebookresearch/InferSent

It is originally designed to detect if two sentences imply or contradict each other but also works quite well for sentence classification task.

The annotations to train the model were collected using a closed-source Python annotation tool Prodigy which I will briefly touch on as well.

Saturday 10:15–11:00 in Tower Suite 1

Sentence embeddings for automated factchecking

Lev Konstantinovskiy

Description

Abstract

Subscribe to Receive PyData Updates