Thursday October 28 5:30 PM – Thursday October 28 6:00 PM in Talks I

Time: The most misunderstood dimension in data modelling

Sergii Mikhtoniuk

Prior knowledge:
No previous knowledge expected

Summary

Time affects all aspects of human life, but in data it remains difficult to work with and often ignored. Let's take a step back and look at the evolution of temporal data modelling in software and data engineering, and see how adding time back to data can save you from many mistakes, expensive redesigns, and get improve state of data globally.

Description

When young engineers and data scientists learn about databases they are always presented with examples that use non-termporal models and update data in-place. Modelling time is considered hard and is often avoided in database courses. Many of us carry this bias for decades into our careers. When faced with challenges of audit and data analytics that require understanding of how data evolved over time we often rush to counter the loss of temporality using techniques like periodic snapshots, change data capture, and historizations, without fully realizing their flaws.

The mismatch between non-temporal services (OLTP) and temporal analytics systems (OLAP) is the main point of contention between software and data engineers in many companies. It leads to multiple reworks and expensive system redesigns, results in high complexity, and prevents us from using data efficiently.

Like with any other information, it's much easier to preserve temporality than to reconstruct it later. In this talk we will look at the evolution of temporal modelling techniques that happened in parallel in software, data analytics, version control, and blockchain domains. We will see how they are all about to converge on a key set of principles that just might be the one "true form" of data we all seek.

We will also look at the tools currently available for working with temporal data to highlight major flaws in batch processing and peek at the next generation of tools that will take us towards real-time and more reliable data.