Tuesday 2:00 PM–2:45 PM in The Trojan Ballroom / ML

Using Simpson’s Paradox to Discover Interesting Patterns in Behavioral Data

Nazanin Alipourfard 🌴, Peter Fennell

Audience level:
Intermediate

Description

This package takes a step toward solving the problem of large size and heterogeneity of behavioral data, by automating discovery from data by leveraging Simpson’s Paradox. The paradox is a phenomenon wherein the entirety of a population behaves differently or in some cases even opposite to the trend observed in the underlying subgroups

Abstract

Big data promises to expand our understanding of human behavior by opening large swaths of it to empirical analysis. Yet, due its size and heterogeneity, behavioral data poses many analytic and algorithmic challenges, which can confound analysis and make it hard to identify interesting phenomena in data. By dissecting the data into more homogenous subgroups, our method can systematically uncover surprising subgroups that behave differently than the rest of the population. We apply our fully automatic method to three different datasets, StackExchange, a Q&A website, KhanAcademy and Duolingo, online learning platforms and we get good results for it.

Subscribe to Receive PyData Updates

Subscribe