PyData New York City 2019 - Presentation: An Introduction to Probability and Statistics

Introduction to Probability and Statistics Speaker: Will Kurt Audience: Anyone interested in learning how to apply statistics to practical problems!

This 90-minute tutorial from the author of “Bayesian Statistics the Fun Way” will provide a quick overview of the practice of using statistics to solve real world problems. Explanation of statistical topics will follow solving a practical example problem. The focus of the tutorial will be on comparing the performance of two products in an ecommerce catalog. You will learn how to use statistics to:

Determine the best estimate for the rate the product is purchased.
Improving this estimate with past performance data.
Compare the performance of one product to another.
See how your estimates change as you get more data.
Model the impacts that having a sale had on the product’s performance.
Separate the effects of the sale from product performance when comparing

All steps in the tutorial will involve demonstrations with Python code. We’ll be making use of numpy, pandas, matplotlib, jupyter and PyMC3. At the end of this talk you will have walked through the process of reasoning statistically about a real data problem.

1. Foundations of Probability and Statistics (30 minutes)

Focus: measuring the performance of a product in an online catalog

A. Probability the logic of uncertainty (15 minutes)

Introduction to basics rules probability
Development of the Binomial Distribution from first principles

B. Statistical Inference: probability in reverse! (15 minutes)

Estimating the probability of an event using the Beta Distribution
How our beliefs change over time
Using prior information to improve our beliefs.

2. Parameter Estimation and Hypothesis testing (20 minutes)

Focus: Comparing two products: which is better and by how much?

A. Hypothesis test as parameter estimate

Estimating two product purchase rates
Comparing these estimates

B. Improving our hypothesis tests with prior probabilities

Incorporating prior probabilities into our estimates
How priors protect us from the “early stopping” problem

3. Linear models for statistical inference (40 minutes)

Focus: What do we do when our test is influenced by random discounts?

A. Brief intro to PyMC3

Creating a simple PyMC3 model

B. Rebuilding our problem as a linear model

Basic linear model for product performance
Comparing results

C. Testing more complex situations

adjusting for random discounting of products
understanding the product comparison
understanding impact of the discount

Wednesday 9:00 AM–10:30 AM in Music Box (5411)

An Introduction to Probability and Statistics

Will Kurt

Description

Abstract