Sunday 15:15–16:00 in A208

Analysing user comments on news articels with Doc2Vec and Machine Learning classification

Robert Meyer

Audience level:
Intermediate

Description

I used the Doc2Vec framework to analyze user comments on German online news articles and uncovered some interesting relations among the data. Furthermore, I fed the resulting Doc2Vec document embeddings as inputs to a supervised machine learning classifier. Can we determine for a particular user comment from which news site it originated?

Abstract

Doc2Vec is a nice neural network framework for text analysis. The machine learning technique computes so called document and word embeddings, i.e. vector representations of documents and words. These representations can be used to uncover semantic relations. For instance, Doc2Vec may learn that the word "King" is similar to "Queen" but less so to "Database".

I used the Doc2Vec framework to analyze user comments on German online news articles and uncovered some interesting relations among the data. Furthermore, I fed the resulting Doc2Vec document embeddings as inputs to a supervised machine learning classifier. Accordingly, given a particular comment, can we determine from which news site it originated? Are there patterns among user comments? Can we identify stereotypical comments for different news sites?

Besides presenting the results of my experiments, I will give a short introduction to Doc2Vec.

Subscribe to Receive PyData Updates

Subscribe

Tickets

Get Now