This tutorial will introduce attendees to the concepts of test-driven data analysis and practical, hands-on use of the tdda library (available from Github and PyPI) for (1) writing reference tests, and (2) generating and verifying constraints from data, using Pandas data frames.
TDDA aims to bring the ideas and benefits of test-driven development to the arena of data analysis, augmenting those ideas as appropriate. There are two central planks of TDDA at present: 1. The idea of a reference test, which is a lot like a system or integration test for an analytical process 2. The idea of using constraints to verify input, intermediate and output data for/from analytical processes.
The tdda library (available from Github and PyPI) provides tooling support for both of these, major current components being - Support for writing tests, under unittest or pytest, than involve comparison of complex objects (e.g. graphs, dataframes etc.), possibly with variable components, and regenerating reference ("expected") results easily when they have changed (after verification!) - Support for automatically generating suggested constraints from example datasets/data frames (including Pandas DataFrames) - Support for verifying a dataset/dataframe against a set of constraints - Support for generating regular expressions from example strings, for possible use as constraints (or otherwise). (It will probably do more by May, but these things are there now!)
This tutorial will introduce users to using these ideas through the tdda library. Users will be able to use their own analytical processes and/or datasets, or to use example data that will be provided.