Thursday October 28 2:30 PM – Thursday October 28 4:00 PM in Workshop/Tutorial II

Know Your Data First: An Introduction to Exploratory Data Analysis

Sin-seok SEO

Prior knowledge:
Previous knowledge expected
Python, Pandas, Jupyter Notebook, Matplotlib

Summary

The very first step of every data science project is to understand the data themselves. Python eco-system serves that purpose really well with various libraries including pandas, matplotlib, seaborn, etc. This hands-on tutorial introduces comprehensive Exploratory Data Analysis (EDA) techniques based on these libraries. It covers data loading, pre-processing, and most importantly visualizations.

Description

Tutorial materials are available here.

  1. Introduction to EDA
  2. Data loading and preprocessing
    • Loading a csv file
    • Merging many csv files
    • Essential check: #Samples, Column Names, Unique Values, Missing Values, etc.
    • sidetable
    • Preprocessing & Feture Engineering
  3. Statistical Visualizations
    • matplotlib: basic building block, essential for fine-tuning
    • pandas: data manipulation + plotting
    • seaborn: handy matplotlib wrapper for statistical visualizations
  4. (Easy Enough) Interactive Visualizations
    • ipywidgets
    • plot.ly and plot.ly express
    • bokeh
    • altair
  5. Automatic EDA Report
    • dtale
    • pandas-profiling
    • sweetviz
    • autoviz
  6. Wrap-up and Some Tips