This tutorial aims to expose future data analysis with the Python Pandas library. The tutorial is meant for absolute beginners to Pandas and Python as a programming language. We will begin with loading tabular data and various ways to calculate summary statistics and visualize data. Next, we will learn various ways to join multiple datasets, and how we can work with missing values etc.
The purpose of the tutorial is to expose users to Pandas for data analysis and move users into a more reproducible analysis workflow compared to spreadsheet programs. The tutorial will begin with loading in tabular data and various ways to view columns and rows of data. The initial part of the tutorial will show users how to load in data and quickly create descriptive statistical plots. Next, we will cover more of pandas internal data structures. This will cover some fundamental knowledge about Python as a programming language, mainly object methods. Finally, before showing users basic data cleaning examples, we will cover data visualizations using matplotlib, seaborn, and pandas itself.
The next section of the tutorial will show learners how to assemble and merge multiple datasets, and how to work with missing values. Lastly, before we get to fitting models, I will go over how to recode variables for analysis.
The main purpose is to show users how to use Pandas, and not how to fit machine learning models. However, I show a simple model at the end so users see how data cleaning and analysis all fit together in a workflow.