Monday 13:10–13:40 in Main Track

Recognizing products from raw text descriptions using “shallow” and “deep” machine learning

Tymoteusz Wołodźko, Tomasz Płomiński

Audience level:
Intermediate

Description

We will compare “shallow” and “deep” machine learning approaches to solving a natural language processing problem. Pros, cons and consequences of both choices will be discussed.

Abstract

Working with raw text data is usually hard, because of their noisy nature. During this talk, we will show two proof-of-concept solutions for the practical problem of recognizing products given raw text descriptions on the online e-commerce platform. We took two different approaches: “shallow” (decision tree ensembles in sklearn/xgboost) and “deep” (recurrent neural network in Keras) machine learning. “Shallow” methods need more feature engineering, but in certain cases it can provide high accuracy at lower computational cost. On the other hand, deep learning can be fed with (almost) a raw text and does the feature engineering semi-automatically, but at the cost of tuning the architecture and hyperparameters of the models.

Subscribe to Receive PyData Updates

Subscribe