Thursday 11:30 AM–12:15 PM in Track 2 - Kodiak

Scan Statistics with Spark Streaming: Distribution Based Real Time Anomaly Detection

Michal Monselise

Audience level:
Intermediate

Description

Scan Statistics is a distribution based methodology for detecting anomalies. This talk will explore the use of scan statistics to perform real time analysis on streaming data using Spark Streaming.

Abstract

Scan Statistics is a distribution based methodology for detecting anomalous data. Unlike simpler methodologies like moving average and exponential smoothing that rely on previous data, we can perform a hypothesis test regarding the distribution of the data and thus perform the analysis in real time. Spark Streaming is a framework that lends itself well to this use case. This talk will introduce a Python package built for Spark Streaming that performs real time anomaly detection using various distributions of count data.

Subscribe to Receive PyData Updates

Subscribe

Tickets

Get Now