Sunday 11:45–12:30 in Tower Suite 1

Databases for Data Science

Alex Hendorf

Audience level:
Intermediate

Description

Databases can be very useful and a real powerhouse for data science projects. Databases have been around for decades and were highly optimised for data aggregations during that time.

This presentation will provide a crash course about the major differences of the various database systems as relational, NoSQL and graph databases.

Abstract

Databases can be very useful and a real powerhouse for data science projects. Databases have been around for decades and were highly optimised for data aggregations during that time.

There are hundreds of database to chose from nowadays, each of them more or less specialised for own use cases. Big Data has changed the landscape of databases massively in the past years - we nowadays can find many Open Source projects among the most popular DBs around.

This presentation will provide a crash course about the major differences of the various database systems as relational, NoSQL and graph databases. We will discuss which databases are worth looking into for data science projects and their straights and weaknesses. In particular we will cover popular databases as PostgreSQL, MongoDB, Cassandra or Neo4J and discuss differences to Hadoop, Spark or other pokemony-big data frameworks.

This presentation will provide answers for: - Where can adding a database to your workflow help you in your projects? - What should you know before working with your company's database? - There is a database jungle out there! What's the right choice for you? - What do you need to know to run a database - is it complicated or a piece of cake?

Subscribe to Receive PyData Updates

Subscribe