This tutorial will offer a quick overview of many of the essentials of statistics used to solve real world problems. We'll start by looking at how to build a simple hypothesis test based on a practical e-commerce problem. Then we'll see how we can expand on this simple test using one of the most powerful tools in statistics: the linear model. No previous experience with statistics required!
Introduction to Probability and Statistics
Speaker: Will Kurt
Audience: Anyone interested in learning how to apply statistics to practical problems!
This 90-minute tutorial from the author of “Bayesian Statistics the Fun Way” will provide a quick overview of the practice of using statistics to solve real world problems. Explanation of statistical topics will follow solving a practical example problem. The focus of the tutorial will be on comparing the performance of two products in an ecommerce catalog. You will learn how to use statistics to:
- Determine the best estimate for the rate the product is purchased.
- Improving this estimate with past performance data.
- Compare the performance of one product to another.
- See how your estimates change as you get more data.
- Model the impacts that having a sale had on the product’s performance.
- Separate the effects of the sale from product performance when comparing
All steps in the tutorial will involve demonstrations with Python code. We’ll be making use of numpy, pandas, matplotlib, jupyter and PyMC3. At the end of this talk you will have walked through the process of reasoning statistically about a real data problem.
1. Foundations of Probability and Statistics (30 minutes)
Focus: measuring the performance of a product in an online catalog
A. Probability the logic of uncertainty (15 minutes)
- Introduction to basics rules probability
- Development of the Binomial Distribution from first principles
B. Statistical Inference: probability in reverse! (15 minutes)
- Estimating the probability of an event using the Beta Distribution
- How our beliefs change over time
- Using prior information to improve our beliefs.
2. Parameter Estimation and Hypothesis testing (20 minutes)
Focus: Comparing two products: which is better and by how much?
A. Hypothesis test as parameter estimate
- Estimating two product purchase rates
- Comparing these estimates
B. Improving our hypothesis tests with prior probabilities
- Incorporating prior probabilities into our estimates
- How priors protect us from the “early stopping” problem
3. Linear models for statistical inference (40 minutes)
Focus: What do we do when our test is influenced by random discounts?
A. Brief intro to PyMC3
- Creating a simple PyMC3 model
B. Rebuilding our problem as a linear model
- Basic linear model for product performance
- Comparing results
C. Testing more complex situations
- adjusting for random discounting of products
- understanding the product comparison
- understanding impact of the discount