Sunday 11:00–11:45 in LG6

Segmenting Channel 4 Viewers using LDA Topic Modelling

Thomas Nuttall

Audience level:


In this talk I will walk through how we used LDA Topic Modelling to segment Channel 4 viewers based on their online viewing behaviour. Combining segment viewing with demographic information and interests/hobbies/lifestyles from survey data provides a wonderfully nuanced understanding of our viewers and allows us to tailor comms, All4, commissioned content, and recommendations to specific tastes.


We wanted to tailor the All 4 experience to the different types of people we know use our service. If, as we suspect, viewing preferences are a reflections of personality, then perhaps the different types of people we wish to cater for can be identified by looking at their viewing behaviour on All 4?

Traditionally topic models expect a corpus of documents as an input, here we substitute documents for viewers and words for views. The patterns (topics) identified are consequently patterns in viewing.

Mllibs LDA topic model is used to recognize these patterns in our viewers’ behaviour on All 4 and allows us to segment them based on how closely aligned their viewing is with each of the topics found.

The model is updated periodically using an online variational Bayes algorithm, which processes a subset of the corpus on each iteration, and updates the term-topic distribution adaptively, allowing us to incorporate new shows (née: words).

Demographic information on our viewers is overlaid to understand gender and age splits of the segments whilst survey data from a subset of each enriches our idea of the flavours of viewer in each group.

This sums to a detailed picture of the distinct groups of viewers we serve and affords us the opportunity to tailor the service we provide, hopefully, for the better.

Subscribe to Receive PyData Updates



Get Now