Presentation: Software inspired workflow for Data Analysis

Time Zone

Saturday October 30 8:00 AM – Saturday October 30 8:30 AM in Talks I

Software inspired workflow for Data Analysis

Imen Ayari

Prior knowledge:: No previous knowledge expected

Summary

In Data driven jobs, we tend to have a huge technical debt, especially for beginners, as we mainly focus on data and not methods. We tend to lose an enormous amount of time re-iterating through code and losing ourselves in the endless loop of clean-and-try. The Question is: Is it possible to apply software-inspired methods in Data Analysis to establish new habits? Hopefully, the answer is yes.

Description

General

Naturally people working in Data driven jobs, are focused on having the most cleaned data ever. We can't compromise on the goal since it gets most of us from the bottleneck to continue on advanced tasks in the roadmap, but we can change our methodology to get our goals in a record time. Having an established workflow can save us time, but, HOW?

Traditionally, we go through the data analysis part by submitting our work through Jupyter notebooks and ending up with several versions of cleaned data and not tested code which could give us a headache as soon as we get hurt with a never-seen-case before.

The answer to this problem is:

Design pattern

Software dev has taught us that having an easily tested design, can save us the headache of regression afterward.
Technical debt is reduced.
Clean code is guaranteed.
Code is very maintainable, easily handled and easily reused.

Conclusion

The main focus of this talk is design patterns and best practices learnt from Software Engineering and applied to Data Analysis. We will go through a quick demo to demonstrate the before and after and how to consolidate what we've learnt in a maintainable workflow.

The talk is mainly for fresh graduates and beginners as it's simply a share of experience and it's an open discussion for new ideas.