Saturday 15:45–16:20 in Auditorium

Case Study: Making the subjective measurable. Sentiment Analysis on Sustainability reporting

Susanne Groothuis

Audience level:


Sustainability topics are becoming more important in the society and companies feel greater need to publish their performance and vision on these topics. Analysing these non-financial reports can be a challenge, as the judgment on them remains a matter of opinion. Using google's BERT model, we have applied it to sentiment analysis on these reports in order to obtain a more objective metric.


Sentiment analysis is commonly employed in cases where companies are interested in public opinion around their product or brand. In most of these cases, data such as tweets, emails or reviews are used. The benefit for these types of analyses is that there are plenty of open datasets available, they often have clear or 'strong' language and are already labelled (e.g. ratings). But in cases where the language is more subtle, has ‘hidden meaning’ or is positively framed, these analyses become a lot more difficult.

The language used in the Sustainability reports we were analysing differed too much from a typical review. Companies (shockingly) do not tend to use language like "useless product, waste of money" in their annual or sustainability report, but rather discuss 'challenges' and 'vow to do better'. This meant that were unable to benefit from existing sentiment analysis solutions or models, as they were trained on the wrong kind of data. In addition, when training new models, we found that they often failed to distinguish between an actual positive sentence, and one that was 'positively framed'.

Fortunately, recent advances in this field of natural language processing (NLP) have brought forth new 'general language understanding' models which obtained great results on a wide range of NLP tasks. One of these models is Google's BERT.

In this talk we will discuss how we succeeded in doing sentiment analysis on sustainability reporting using BERT base for transfer learning. How we obtained over 8000 newly labelled 'sustainability sentiment' sentences, and in our process highlighted how providing the correct answer to a subjective matter is not always easy.

Subscribe to Receive PyData Updates