Saturday 2:20 p.m.–3 p.m.

If It Weighs the Same as a Duck: Detecting Fraud with Python and Machine Learning

Ryan Wang

Audience level:
Intermediate

Description

Stripe's system for preventing fraudulent payments utilizes a mix of machine learning and data analysis. This talk will describe some technical challenges we’ve faced in building it. In particular, I will discuss how we’ve used (and occasionally written) various Python packages as part of a broader ecosystem to address data processing, feature engineering, and model evaluation problems.

Abstract

Stripe's system for preventing fraudulent payments utilizes a mix of machine learning and data analysis. Over the last few years, it has evolved from a collection of manually assembled ad-hoc rules to an ensemble of machine-learned models based on historical data from across the entire Stripe network. This talk will describe some of the technical challenges we've faced in building and scaling it. In particular I will discuss how we've used (and occasionally written) various Python packages as part of a broader ecosystem to address data processing, feature engineering, and model evaluation problems.

Some examples:

  • We use scikit-learn to train a majority of our models
  • We use luigi to manage long-running feature generation jobs and model training scripts
  • We use pandas to debug models and features that generate systematic false positives
  • We wrote topmodel to evaluate model performance on both production and backtested data

Sponsors


Become a sponsor.