In this talk I'll present a crash course in NLP, focusing on tools to perform document-level summarization. Specifically, I'll go through TF•IDF and topic modelling. We'll use these techniques to make sense of the language people use on the web when describing beer. We'll apply this to some 3 million reviews covering 120,000 different beers, creating a concise description for any commercial beer.
Natural language processing (NLP) is among the oldest of Computer Science fields, dating back at least to the 1950s. In this talk I'll present a crash course in NLP, focusing on tools to perform document-level summarization and understanding. Specifically, I'll go through TF•IDF and topic modelling. We'll use these techniques to make sense of the language people use on the web when describing beer. I'll introduce a dataset containing some 3 million paragraph length reviews of 120,000 beers.
We'll use this data to create a concise description for any commercially available beer. These descriptions will draw out the differences between the different techniques, at an intuitive level. We will then look at ways to quantify the distance between documents, which will then be used to show how similar different beers are. By the end of this talk, the audience should have enough of an understanding to use document-level NLP in various domains and applications, and perhaps sound a bit more informed when ordering a beer.