Analyze product reviews and identify dominating attributes of products and quantify the satisfaction level for specific attributes of products using scikit-learn and gensim.
Every industry is collecting large amounts of data on all aspects of their business (product, marketing, sales, etc.). Most of this data is unstructured and it is imperative to extract actionable insights to justify the infrastructure required for Big Data processing. Natural language Processing (NLP) provides an important tool to extract structured information from unstructured text. I will use NLP techniques to analyze product reviews and identify dominating attributes of products and quantify the satisfaction level for specific attributes of products. This technique leads to the understanding of inconsistent reviews and detection of the most significant attributes of products. I will apply scikit-learn, nltk, gensim to work on the data wrangling and modeling techniques (topic modeling,word2vec) and use IPython notebook to demonstrate some of the results of the analysis.