Saturday October 30 9:00 AM – Saturday October 30 9:30 AM in Talks I

What could possibly go wrong when evaluating forecasts?

Malte Tichy

Prior knowledge:
Previous knowledge expected
python, regression models for counts, metrics/loss functions for model evaluation


Loss functions like absolute error are often used to evaluate predictions for countable quantities. These loss functions suffer, however, from pitfalls with hazardous consequences, like choosing a worse model over a better one. This talk equips you with the tools to assess forecast quality. We drill into what values loss functions can achieve, and how to identify and fix biased evaluations.


This talk will enable you with the mathematical tools and the probabilistic mindset to critically assess predictive models for countable, integer-valued quantities, like the number of sold units, or number of clicks. This will help you, as a data scientist, to evaluate models, to set model quality metrics into a meaningful context, to communicate about model quality to stakeholders and to manage expectations.

In our examples, we will use open-source tooling included in scipy.stats and many similar packages to explore how good a prediction for countable events can be, in principle, and what are the consequences for evaluation metrics. We will encounter some (actually quite entertaining) paradoxes on the way, and learn how to avoid or resolve these, so they don’t hinder your model evaluation.

If you are interested in the math behind model evaluations and you have worked with regression models before, then this talk is for you. No in-depth knowledge of statistics is required, only the willingness and openness to think conceptually and deeply about what a prediction is supposed to actually mean. You will be astonished about what can possibly go wrong, understand why that can happen, and learn how to avoid these obstacles.

Takeaways: - There is no shortcut between simple metrics and model quality, and there are pitfalls on the path of model evaluation: it’s easy to come to wrong conclusions, but there are guardrails that you can follow. - Predictive technology has limits – just like every technology. We need to be aware of these limits, evaluate them, and benchmark our technology against realistic limits. - By following some guideline questions, you can make sure the evaluation you do really reflects the business value of your prediction.