Sunday 1:15 PM–2:00 PM in Room #1023/1022/1020 (1st Floor)

Dynamics in Graph Analysis: Adding Time as a Structure for Visual and Statistical Insight

Benjamin Bengfort

Audience level:
Intermediate

Description

Network analyses are powerful methods for both visual analytics and machine learning but can suffer as their complexity increases. By embedding time as a structural element rather than a property, we will explore how time series and interactive analysis can be improved on Graph structures. Primarily we will look at decomposition in NLP-extracted concept graphs using NetworkX and Graph Tool.

Abstract

Modeling data as networks of relationships between entities can be a powerful method for both visual analytics and machine learning; people are very good at distinguishing patterns from interconnected structures, and machine learning methods get a performance improvement when applied to graph data structures. However, as these structures become more complex or embed more information over time, both visual and algorithmic methods get messy; visual analyses suffer from the "hairball" effect, and graph algorithms require either more traversal or increased computation at each vertex. A growing area to reduce this complexity and optimize analytics is the use of interactive and subgraph techniques that model how graph structures change over time.

In this talk, I demonstrate two practical techniques for embedding time into graphs, not as computational properties, but rather as structural elements. The first technique is to add time as a node to the graph, which allows the graph to remain static and complete, but minimizes traversals and allows filtering. The second is to represent a single graph as multiple subgraphs where each is a snapshot at a particular time. This allows us to use time series analytics on our graphs, but perhaps more importantly, to use animation or interactive methodologies to visually explore those changes and provide meaningful dynamics.

We explore these dynamics using a concept graph extracted from a natural language corpus using NLP techniques as well as a graph of GitHub commits. The creation and analysis of the graphs will be conducted via NetworkX and the visualization aspects will be conducted using Graph Tool and matplotlib. An outline is as follows:

  1. An introduction to our graphs and possible insights
  2. Where can we find the time? Data wrangling for time as a property
  3. Static analyses: moving time from a property model to a structural element
  4. Traversals and subgraphs of time
  5. The "betweenness" of time and new node proximities
  6. A small overview of network visualization
  7. Visualizing static layouts with timely edge properties
  8. Adding dynamics for interactive analysis
  9. Animating the change in structure of a graph over time
  10. Finding insights with visual analytics: overview first, zoom and filter, details on demand.

All code will be made open source, as will the data sets that we collect!