PyData Delhi 2019 - Presentation: Autocorrect of words and Autosuggest of Sentences

1.As we already have a product selling platform in my organization , we were thinking continuously to enhance the experience of the customers in a better way.As a human is going to enter the keyword in search space to find his/her interest , then there is always a chance of human error is going to happen.Previously if a wrong keyword has been entered in the platform it showed no results , this background inspires me to think about the autocorrecting the keyword so that users will find their desired product and as well as suggest the sentences so that they don’t need to type fully by themselves,they just select the suggestion and buy their desired product.

2. As per the thinking I tried to build an API which provide autocorrection and autosuggestion of user input.I took the dataset available related to the products available in the platform. From the datasets keywords have been generated and put those in a dictionary sorted by the occurence in the whole dataset and those keywords are fed into elasticsearch server. For autocorrection we fed each word to the elastic server maintaining their occurrence and searched from server using prefix search technique. For autosuggestion we fed each sentence of the dataset in the elastic server and searched for the required sentence while autosuggesting using match phrase technique.

3. A. The first roadblocks we faced is as we took only product related dataset as a source data , so , our API donot autocorrecting normal dictionary words and also not suggesting anything. We then include dictionary words in our source dataset. B. These dictionary words inclusion though solve the problems of autocorrection but autosuggestion problem still remain same for the sentences like - “I want to buy” etc..as these are not product related keywords. C. One more roadblock we faced regarding autocorrection is , as dictionary words donot have any occurrence priority so every keywords get similar priority and that was quite lesser than product related keywords.So sometimes API autocorrect keyword wrongly to product related keyword rather than normal dictionary keyword.

4.To improve this API fully we think to use the user inputs and have to log them in a cache and from that we can get the mostly used keywords and mostly used sentences and we increase the priority of those in source dataset and feed them into elastic server.Then the corrected and suggested results will be much more improved.

Sunday 4:30 PM–5:10 PM in C11

Autocorrect of words and Autosuggest of Sentences

Sayantan Gangopadhyay

Description

Abstract

Subscribe to Receive PyData Updates