Tuesday 9:00 AM–10:30 AM in Tutorial Track 3

Introduction to Data Analysis with Python datatable

Ana Castro Salazar, Pasha Stetsenko

Audience level:
Novice

Description

In this tutorial, an introduction of Data Analysis with Python datatable, one would learn about data wrangling with datatable via a banking loan scenario using a subset of the Fannie Mae and Freddie Mac datasets. We would show how to munge loan-level data, obtain basic insights, exploratory data analysis, model development, and model evaluation.

Abstract

During the tutorial session, we would use a banking loan scenario using a subset of the Fannie Mae and Freddie Mac datasets where we would show how to munge loan-level data. Additionally, we would give an overview of how Python datatable is used to obtain basic insights that start with data wrangling, exploratory data analysis, model development, and model evaluation.

Python datatable is a library that implements a wide (and growing) range of operators for manipulating two-dimensional data frames. It focuses on: big data support, high performance, both in-memory and out-of-memory datasets, and multithreaded algorithms. Datatable’s powerful API is similar to R data.table’s, and it strives in providing friendlier and intuitive API experience with helpful error messages to accelerate problem-solving.

Learn more about Python datatable: https://github.com/h2oai/datatable

Prerequisites

Note: As of now, datatable is only supported on Linux and Mac OS X. However, one can use it on Windows via a docker container.

Tutorial:

Subscribe to Receive PyData Updates

Subscribe