This tutorial will use a stack of packages including pandas and seaborn that provide tools for data manipulation, summaries, analyses and visualization. This will be a practical tutorial using public data sets.
Abstract
Outline of tutorial
Introducing the Anaconda Python distribution and JupyterLab IDE
Data types
Loops and list comprehensions
Loading and using packages
Introduction to the pandas package
Importing data from CSV, Excel and SQL databases
Data types in pandas (numerical, categorical, binary, boolean)
Creating numerical summaries
Exploring data grouped by a set of variables
Exploratory statistical graphics using the seaborn package
Estimating basic statistics like mean, median, standard deviation and quantiles
Basic probability distributions (normal/Gaussian, binomial, Poisson, exponential, Chi-squared) including generating random numbers and finding critical values.
How pandas creates dummy variables from categorical variables
Linear & logistic regression and the formula interface