PyData Washington DC 2018 - Presentation: Probabilistic Programming in the Real World

Probabilistic Programming in the Real World

Probabilistic programming frameworks get a lot of press, but making the jump from conceptual understanding to building and deploying custom system takes a lot of stumbling and outside research. This talk aims to address that gap and offer concrete strategies to employing Bayesian model frameworks like STAN, PyMC3, and Turing.jl in order to solve real-world problems. Worked examples are in PyMC3, but concepts are transferable between frameworks.

Toy example (5 minutes) I'll provide a detailed example of a real probabilistic program I've created for a problem I've encountered -- determining how funny a joke is. The problem statement, giveaways for a probabilistic programming approach, and solidified structure in PyMC3 will all be conveyed clearly and can be generalized to other problems of hidden parameter estimation.

Theoretical underpinnings for Bayesian modeling (10 minutes) I'll provide some light background on how Bayes' rule can be used to construct and solve probabilistic graphical models. Then, we'll discuss Monte Carlo sampling vs numerical integration for Bayes' denominator and detail why the Metropolis algorithm is superior to a hill-climbing approach.

Identifying problems that lend themselves to probabilistic programming (15 minutes) This section deals with spotting problems in the real world for which a Bayesian modeling approach (and associated open-source framework) are particularly applicable. I'll focus on three markers: 1. Small or expensive labeled data 2. Reasoning with uncertainty 3. Imparting expert knowledge onto a model

For each of these markers, I'll provide background and a real-world example from industry.

Probabilistic programming in the modern data science toolbox (5 minutes) To close the talk, we'll discuss the role of probabilistic programming as a tool in the full-stack data scientist's utility belt. We'll review the relationship between Bayesian frameworks and problems in deep learning and natural language processing.

Questions and discussion (5 minutes)

Sunday 11:00 AM–11:45 AM in Modeling & Data Techniques - Rm 100A

Probabilistic Programming in the Real World

Zach Anglin

Description

Abstract

Subscribe to Receive PyData Updates