Finding clusters is a powerful tool for understanding and exploring data. While the task sounds easy, it can be surprisingly difficult to do it well. Most standard clustering algorithms can, and do, provide very poor clustering results in many cases. We discuss how to do clustering correctly.
Finding clusters is a powerful tool for understanding and exploring data. While the task sounds easy, it can be surprisingly difficult to it well. Most standard clustering algorithms can, and do, provide very poor clustering results in many cases. Our intuitions for what a cluster is are not as clear as we would like, and can easily be lead astray. We will attempt to find a definition of clustering that makes sense for most cases, and introduce an algorithm for finding such clusters, along with a high performance python implementation of the algorithm, building up more intuition for what clustering really means as we go.