Sunday 14:10–14:45 in Auditorium

Text analytics on annual reports

Anahita Farokhi, Eltjona Qato

Audience level:
Novice

Description

Reviewing annual reports is currently a manual task. Besides that this is laborious, it is also biased. We developed an application that quantitatively analyses the content of annual reports and provide the auditors with several metrics such as readability and sentiment score.

Abstract

Annual reports give a comprehensive overview of companies their activities and (financial) performance. Reviewing these annual reports, however, is currently a manual task. Besides that this is laborious, it is also biased.

As PwC's role as a trusted advisor we aimed to improve the review process its quality and objectiveness using machine learning. Therefore, we developed an application that quantitatively analyses the content of annual reports and provide the auditors with several metrics such as readability and sentiment score.

During this talk we will provide insight into our process by taking you along the journey an annual report will follow. From the uploading of the PDF report to our application on Azure, to the extraction of text via OCR with pdftotree and ending at the analysis of the text through NLP with spaCy.

Subscribe to Receive PyData Updates

Subscribe