Saturday October 30 2:00 PM – Saturday October 30 2:30 PM in Talks I

Extracting complements and substitutes from sales data - a network perspective

Sebastian Lautz

Prior knowledge:
Previous knowledge expected
Some familiarity with networks/graphs would be helpful.


Substitutability and complementarity are important properties in retail for forecasting demand or optimising the product catalogue according to customer needs.This talk, intended for Data Scientists working in retail or those generally interested in learning more about network science techniques, will present a network-based approach to modelling these product relationships.


In order to more accurately forecast demand or make ranging decisions for supermarket shelves, it is important to model product interactions, such as substitutability and complementarity.

Products are not bought in isolation. For example, customers usually buy hot dogs and buns together (complementarity), or more often than not choose one brand of cola over another (substitutability).

The standard economic approach to this problem usually characterises complementarity and substitutability in terms of their sign of cross-elasticity of demand (how a product’s demand goes up or down depending on price changes of others).

However, this data is not always readily available. Other approaches require customer profiles and further potentially personally identifiable information – which leads to privacy concerns.

In this talk, I propose an alternative approach that solely relies on basket data, and the underlying network structure in terms of products and their co-occurrence in baskets.

More concretely, I start from a bipartite product-purchase network representation with products connected to the transactions they appear in. The main idea is that complements are defined as products that are bought together significantly more frequently, whereas substitutes share the same complements but are bought together significantly less frequently. Thus, I develop appropriate null models to infer these relations, as well as measures based on random walks on the network to quantify their importance.

The resulting weighted (unipartite) networks between products are then analysed with community detection methods to identify groups of substitutes or complements. Finally, we validate our approach by comparing our findings on real-world basket data to existing product hierarchies, as well as recipe data.

This admittedly theory-heavy talk will be of particular interest to Data Scientists interested in network science techniques or generally in how to model product interactions in retail. Some familiarity with graph/network terminology is helpful.

By the end, I will hopefully have convinced you that our network approach opens up a promising new angle on modelling product relationships together with an idea of what the vast network science toolbox has to offer.