In the same way that we need to make assertions about how code functions, we need to make assertions about data, and unit testing is a promising framework. In this talk, we'll explore what is unique about unit testing data, and see how Two Sigma's open source library Marbles addresses these unique challenges in several real-world scenarios.
In the same way that we need to make assertions about how code functions, we need to make assertions about data: that all the data are there, that they contain the signals that we expect at the fidelity we need, etc. And because data are always changing, this isn't a one-time cost. We need to continually assert that data are meeting our expectations.
In this talk, we'll see how to apply unit testing to data in several real-world scenarios, We'll see where the approach works well, where it doesn't work well, and why, highlighting the unique challenges of data and of unit testing data. Finally, we'll introduce Marbles, a new open source library from Two Sigma that was developed to address these unique challenges, and talk about what we learned about software unit testing while trying to build a tool for data unit testing.
The audience will leave this talk having seen concrete examples of how to unit test data, and will be able to download and install Marbles to get started writing their own right away. This talk will be most valuable for audience members that are already familiar with unit testing, but we'll cover the basics and show lots of examples, so it will be valuable for those that are unfamiliar with unit testing, too.