Friday 13:10–13:40 in Track 1

Developing Data Science products - Agile approach at Grupa Pracuj

Jan Zyśko, Magdalena Kalbarczyk

Audience level:
Intermediate

Description

We use a case study to present the approach to developing Data Science products at Grupa Pracuj. Agile development and maintenance of such products pose unique challenges, as their usability strongly depends on having accurate models and efficient data pipelines. In the talk we go through different phases of development of one such product, which employs Deep Learning to solve an NLP problem.

Abstract

There is a long and bumpy road from defining business needs to creating useful and understandable Data Science tool for either internal or external users. Undertaking such task requires preparing data flows, developing Machine Learning models, and presenting the results to end users. Moreover, it all has to be done in close collaboration with Business, in order to ensure rapid prototyping and maximum impact. In this talk, we present a case study of one such project.

During the last several months we tried to address the need for predicting pracuj.pl users’ behaviour. We started with an experiment, which helped us to see if users behave predictably enough on a macro scale for our models to achieve useful results.

After obtaining promising results, we then moved on to create an MVP for internal usage, to see if the product would be useful as a Django application for our CC department. At this point we had a fairly complicated technology stack - Python, Hadoop, SQL Server, and AWS. This is because we prioritized development speed over seamless integration.

When the usability and usefulness of the application was confirmed, we moved over to the creation of a proper ETL, which minimizes the integration and security issues and offers good scalability and computing costs. These works will most likely extend to Q4.

Our future plans for the project involve, in addition to continuous work on model accuracy, tapping into the very recent research on the interpretability of NLP models, in order to provide our end users with actionable feedback. Moreover, we are looking into the possibility of presenting the insights from the model directly to the pracuj.pl customers.

1. Introduction

2. Solution development

3. What we have learned

4. Questions

Subscribe to Receive PyData Updates

Subscribe

Tickets

Get Now