Monday 3:15 PM–4:00 PM in The Trojan Ballroom / ML

Attacking Clustered Data with a Mixed Effects Random Forests Model in Python

Sourav Dey

Audience level:
Intermediate

Description

Clustered data is all around us. The best way to attack it? Mixed effect models. Sourav will explain the use cases of MERF, how the mixed effects random forests model marries the world of classical mixed effect modeling with modern machine learning algorithms, and how it can be extended to be used with other advanced modeling techniques like gradient boosting machines and deep learning.

Abstract

Lots of data in the wild has a clustered structure. In fact, clustered data is all around us. The best way to attack it? Mixed effect models. Based on the work of Prof. Larocque from HEC and Prof. Ahlem from l’UQAM and later expanded upon by Sourav and the team at Manifold, they developed an open source-implementation package for the Python community to use, — and build upon: Mixed Effects Random Forests.

TL;DR: MERFs are great if your model has non-negligible random effects, e.g. there are large idiosyncrasies by cluster. Want to learn more? Read on.

Sourav will explain how the mixed effects random forests model marries the world of classical mixed effect modeling with modern machine learning algorithms, and how it can be extended to be used with other advanced modeling techniques like gradient boosting machines and deep learning. He will dive into the MERF algorithm including origins, model, principles, mathematical details, and predictions for known and new clusters, provide examples of use cases of mixed effects random forests, and demonstrate MERF performance on synthetic and real data.

Learn the history of mixed effect modeling, why the mixed effect random forests model is the best way to attack clustered data as opposed to other modeling techniques like complete pooling, one hot encoding, and the classical mixed effects model, and how to use the MERF open source Python package on your data.

Subscribe to Receive PyData Updates