Applying graph analytics on data stored in relational databases can provide tremendous value in many application domains. We discuss the importance of leveraging these analyses, and the challenges in enabling them. We present a tool, called GraphGen, that allows users to visually explore, and rapidly analyze (using NetworkX) different graph structures present in their databases.
Analyzing interconnection structures among underlying entities or objects in a dataset through the use of graph analytics (network science) has been shown to provide tremendous value in many application domains. However, graphs are not the primary representation choice for storing most data today. Individuals and businesses instead tend to choose more general purpose storage systems, the most popular being relational databases, often preferred for the strong integrity and consistency guarantees they provide. In these cases, users are therefore forced to manually extract data from their data stores, construct the requisite graphs, and then load them into a graph engine (like NetworkX) to execute their graph analysis task. Moreover, users may not know exactly which graphs in their dataset they would like to analyze, or they may be unsure about whether a particular graph would be useful or not. This process is not only tedious and cumbersome, but also computationally intensive and time-consuming.
We present a system called GraphGen, geared towards making the extraction of graphs from relational databases effortless as well as enabling the visual exploration of the wide variety of inherent or hidden graphs inside structured relational datasets. GraphGen consists of a Web application for visual exploration of the potential graphs, as well as a Python library called GraphGenPy for using our intuitive domain specific language (DSL) to declare graph extraction tasks, that are automatically executed over a relational database. In this talk, I will discuss the challenges in achieving the above goals as well as provide an overview of the end-to-end process using GraphGenPy and NetworkX for leveraging graph analysis over a relational database schema.