PyData Los Angeles 2019 - Presentation: Introduction to Data Analysis with Python datatable

In this tutorial, an introduction of Data Analysis with Python datatable, one would learn about data wrangling with datatable via a banking loan scenario using a subset of the Fannie Mae and Freddie Mac datasets. We would show how to munge loan-level data, obtain basic insights, exploratory data analysis, model development, and model evaluation.

During the tutorial session, we would use a banking loan scenario using a subset of the Fannie Mae and Freddie Mac datasets where we would show how to munge loan-level data. Additionally, we would give an overview of how Python datatable is used to obtain basic insights that start with data wrangling, exploratory data analysis, model development, and model evaluation.

Python datatable is a library that implements a wide (and growing) range of operators for manipulating two-dimensional data frames. It focuses on: big data support, high performance, both in-memory and out-of-memory datasets, and multithreaded algorithms. Datatable’s powerful API is similar to R data.table’s, and it strives in providing friendlier and intuitive API experience with helpful error messages to accelerate problem-solving.

Learn more about Python datatable: https://github.com/h2oai/datatable

Prerequisites

Basic knowledge of Statistics and Machine Learning
Basic knowledge of Python
JupyterLab
Python datatable installed on your local machine or use cloud env:
- datatable can be install by following: https://datatable.readthedocs.io/en/latest/install.html

Note: As of now, datatable is only supported on Linux and Mac OS X. However, one can use it on Windows via a docker container.

Tutorial:

Task 0: Introduction to Python datatable(10 min)
Task 1: datatable vs Pandas (10 mins)
Task 2: Understand the dataset (10 mins)
Task 3: datatable - Data Wrangling (10 mins)
Task 4: datatable - Exploratory Data Analysis (10 mins)
Task 5: datatable - Model Development (10 mins)
Task 6: datatable - Model Evaluation (10 mins)
Task 7: Q&A (10 - 15 mins)

Tuesday 9:00 AM–10:30 AM in Tutorial Track 3

Introduction to Data Analysis with Python datatable

Ana Castro Salazar, Pasha Stetsenko

Description

Abstract

Prerequisites

Tutorial:

Subscribe to Receive PyData Updates