Presentation: Turning Pandas DataFrames to Semantic Knowledge Graph

Time Zone

Friday October 29 11:30 AM – Friday October 29 12:00 PM in Talks I

Turning Pandas DataFrames to Semantic Knowledge Graph

Cheuk Ting Ho

Prior knowledge:: No previous knowledge expected

Summary

Storing data in tables has its limitations. Usually joining and aggregations are required to represent more complicated datasets and extract desirable data. Storing data in a semantic graph may be the solution and I am showing you how to programmatically switching from pandas to the knowledge graph.

Description

Remember how many times you look up “how to do this in pandas”? Though it is the most popular data handling library in Python, it is quite complicated due to the rigidness of storing data in tabular formats. This is most obvious when the data stored is imported from a JSON file and end up having multiple layers of objects. At this point, you wished for a data structure that let you store data with objects and subclasses, just like in object-orientated programs. The answer? Semantic knowledge graphs.

In this talk, Cheuk will first introduce what is semantic knowledge graphs. It’s building block: triples, and how all data can be described will them - with objects and properties. Cheuk will assume no prior knowledge and will explain via examples and visualization with the TerminusDB model builder - a graphical interface that allows you to build schemas for semantic knowledge graphs.

In the next part, Cheuk will show how to construct a schema based on a pandas DataFrame. With the Python client of TemrinusDB, schema can be built programmatically follow by importing the data in the DataFrame. In this part, basic Python knowledge is assumed. In this part, Cheuk will show the internals of pandas, dissecting it and reconstruct a knowledge graph schema. Cheuk will also show the code that transforms the data and insert them in the prepared graph.

Finally, Cheuk will visualize the graph in a customized interactive graph visualization in Jupyter notebook.

This talk is for data scientist and engineers who works with data and using pandas a lot. They may need a new tool and new skills to expand their repertoire of data handling and Semantic Knowledge Graph would be a high value one.