PyData Delhi 2019 - Presentation: Predicting Real-Time Transaction Fraud Using Python and spark (Sponsored Talk)

Transaction Fraud Model Development

Predicting transaction fraud of debit and credit card payments in real-time is an important challenge, which new technologies and state-of-art supervised machine learning models can help to solve. While different supervised learning techniques, like Logistic Regression and Neural Networks, have been used for many years, recent developments in Deep Learning, Gradient Boosted Machines, and Recurrent Neural Networks, have opened up a wealth of options that can provide significant improvements over the existing models.

Advantages of Distributed Computing

While the transaction volumes are humongous (billions of transaction per year), non-distributed packages like numpy or pandas easily run out of memory. Distributed computing solves this problem. Spark serves as a solution to Raw data processing, Data Quality and Reconciliation and most importantly Feature engineering where thousands of features are being created and tested.

Real Challenges with fraud data

Machine Learning techniques are in general well-suited for transaction fraud, however, large data volumes (billions of transaction per year), very imbalanced target class (rare events), ever changing fraud MOs, and strict requirements for the prediction inference speed, mean that some methods are better suited than others. With the help of open source technologies like python and distributed computing using spark, Barclays has been developing and testing different solutions to reduce fraud losses and limit adverse customer experience.

The main emphasis of the talk is to show how to train supervised transaction fraud models that can be implemented and how these models improve both customer experience and help to reduce fraud losses. The presentation will show results of a machine learning model that is operating in production.

The audience will learn - how real-time transaction fraud models work and the main challenges in transactions fraud modelling - how distributed computing can come to an advantage - which supervised machine learning techniques are most applicable

Saturday 4:30 PM–5:10 PM in C01

Predicting Real-Time Transaction Fraud Using Python and spark (Sponsored Talk)

Mayank Jain

Description

Abstract

Transaction Fraud Model Development

Advantages of Distributed Computing

Real Challenges with fraud data

Subscribe to Receive PyData Updates