PyData Warsaw 2019 - Presentation: Sentiment analysis of tweets in Polish language using deep learning

Sentiment analysis in texts is a problem that can be solved using Artificial Intelligence. The talk's goal is to present how to make detect five emotions (happiness, anger, sadness, fear, disgust) for tweets in Polish language using open source tools. During the presentation, the process from data collecting to creating a deep neural network will be shown.

Sentiment analysis is one of the problems which can be solved using Artificial Intelligence. Most research related to the analysis of emotions in Polish texts from social media, (especially from Twitter) focus only on classification as positive, negative or neutral. In my pitch, I would like to concentrate on detecting 5 emotions like happiness, sadness, anger, disgust and fear.

During the presentation I will show how the whole process from data collecting, preprocessing and labelling to training model and presenting results looks like. The implementation was done using Python.

First, text data from Twitter was cleaned from links, emojis and unknown symbols, which are unnecessary for the analysis. For input to the model, texts were converted to numeric representation in vectors. This representation was generated using pretrained Word2vec model for the Polish language.

For labelling data, words were transformed to lemmas (dictionary form) using Morfeusz 2 package, which is an inflectional analyser. This operation was necessary to generate vectors of the numeric representation of emotions. For every word a vector with the numeric representation of 5 emotions using Necki Affective Word List was created. This list contains numeric information of emotions about words in lemma grammar form.

The model used for the sentiment analysis was LSTM network. The first layer of the model was the embedding layer which takes weights for every word from Word2vec model. Then were LSTM and dense layers.

At the end of presentation, I will show the results with examples of good classification and misclassification.

Friday 14:25–14:55 in Track 3

Sentiment analysis of tweets in Polish language using deep learning

Joanna Piwko

Description

Abstract

Subscribe to Receive PyData Updates