Saturday 16:15–17:00 in Auditorium

Real-time association mining in large social networks

Ben Chamberlain

Audience level:
Intermediate

Description

Social media can be used to perceive the relationships between individuals, companies and brands. Understanding the relationships between key entities is of vital importance for decision support in a swathe of industries. We present a real-time method to query and visualise regions of networks that could represent an industries, sports or political parties etc.

Abstract

There is a growing realisation that to combat the waning effectiveness of traditional marketing, social media platform owners need to find new ways to monetise their data. Social media data contains rich information describing how real world entities relate to each other. Understanding the allegiances, communities and structure of key entities is of vital importance for decision support in a swathe of industries that have hitherto relied on expensive, small scale survey data. We present a real-time method to query and visualise regions of networks that are closely related to a set of input vertices. The input vertices can define an industry, political party, sport etc. The key idea is that in large digital social networks measuring similarity via direct connections between nodes is not robust, but that robust similarities between nodes can be attained through the similarity of their neighbourhood graphs. We are able to achieve real-time performance by compressing the neighbourhood graphs using minhash signatures and facilitate rapid queries through Locality Sensitive Hashing. These techniques reduce query times from hours using industrial desktop machines to milliseconds on standard laptops. Our method allows analysts to interactively explore strongly associated regions of large networks in real time. Our work has been deployed in Python based software and uses the scipy stack (specifically numpy, pandas, scikit-learn and matplotlib) as well as the python igraph implementation.