Saturday 3:00 PM–3:45 PM in Room #1023/1022/1020 (1st Floor)

Clustering: A Guide for the Perplexed

Leland McInnes, John Healy

Audience level:
Novice

Description

Finding clusters is a powerful tool for understanding and exploring data. While the task sounds easy, it can be surprisingly difficult to do it well. Most standard clustering algorithms can, and do, provide very poor clustering results in many cases. We discuss how to do clustering correctly.

Abstract

Finding clusters is a powerful tool for understanding and exploring data. While the task sounds easy, it can be surprisingly difficult to it well. Most standard clustering algorithms can, and do, provide very poor clustering results in many cases. Our intuitions for what a cluster is are not as clear as we would like, and can easily be lead astray. We will attempt to find a definition of clustering that makes sense for most cases, and introduce an algorithm for finding such clusters, along with a high performance python implementation of the algorithm, building up more intuition for what clustering really means as we go.