This talk will present our work on gathering and analyzing 10K and 10Q filings using NLP techniques. These publicly available quarterly and annual reports contain valuable information for stock holders on the performance of listed companies. We explore how information can be extracted automatically from the full-text and deployed in a quantitative stock selection model.
This talk will present our work on gathering and analyzing 10K and 10Q filings using NLP techniques. These publicly available quarterly and annual reports contain valuable information for stock holders on the performance of listed companies. We explore how information can be extracted automatically from the full-text and deployed in a quantitative stock selection model. While the NLP methods presented are not novel, we highlight lessons learned from processing the data and transfering academic results into a real-world application.
On the technical side, the talk will (1) sketch our Python pipeline for data set construction and daily updating; (2) describe the methods for analyzing content within reports and relations between reports. On the analytical side, we present experimental results and discuss challenges which arise when determining the usefulness of these methods in the context of a financial model.
We will see that - Analyzing this data set requires large-scale resources, and keeping information up-to-date can be tricky. - Simple bag-of-words based methods for document similarity, readability and sentiment can easily be implemented.Evaluations in a model context show promise, but putting them into practice poses additional challenges. - Promising research on measuring competitiveness using network analysis is hard to replicate when evaluated with more business-related metrics.
The talk will be application-focused. It should be of interest to developers and researchers looking into financial NLP, and anyone interested in the company filings data set.