There has been a boom in the application of machine learning to solve business problems involving big data, but very few data scientists and statisticians are familiar with methods of causal inference. This talk is a primer on causal inference: what is it, why is it important and how can we apply it to projects?
Questions involving causal mechanisms are ubiquitous, e.g.: How does passive smoking affect the mortality of non-smokers? What is the effect of minimum wage laws on employment? What are the causes of customer churn at a software company?
Those examples do not lend themselves well to formal experiments but there are ways they can be tackled with causal inference methods. The basic concepts of causal inference are accessible to anyone who has knowledge of simple regression analysis and statistics.
In this talk I will give an introduction to causal inference, with some examples from my experience in a tech startup and social sciences:
The 'toolkit' for causal inference has only really emerged in the last few decades. Statisticians have struggled for centuries to formalise an approach for tackling causal questions. Methods of causal inference have developed in fields such as epidemiology, econometrics and computer science, but very few data scientists who do regression modelling have any familiarity with such methods.
Econometrics knowledge certainly equipped me with a way to produce causal diagrams, but I was confused about model specification in big datasets, especially ones that do not involve formal experiments, until I discovered other approaches to causal inference. I have also been working on my own personal project, Appelpy (🍏🥧 Applied Econometrics Library for Python), to make econometric methods more accessible for Python users.