Sunday 2:10 p.m.–2:50 p.m.

Hierarchical Data Clustering in Python

Frank Kelly

Audience level:
Novice

Description

Clustering of data is an increasingly important task for many data scientists. This talk will explore the challenge of hierarchical clustering of text data for summarisation purposes. We'll take a look at some great solutions now available to Python users including the relevant Scikit Learn libraries, via Elasticsearch (with the carrot2 plugin), and check out visualisations from both approaches.

Abstract

  • Background: methods for clustering text data and the challenge of data summarisation
  • Hierarchical clustering: agglomerative vs divisive
  • sklearn.cluster and metrics modules
  • Elasticsearch + carrot2 plugin
  • Performance comparisons, assessment of ease of scalability and use
  • Static visualisation using Matplotlib, interactive using Foamtree

Sponsors


Become a sponsor.