We will talk about a framework we have developed to use Machine Learning and other advanced analytical methods to reduce risk in the Public Sector. This python-based assurance scoring framework, developed with Pandas & Scikit-Learn, changes the emphasis of traditional risk-scoring frameworks to identifying compliant behaviour; we discuss some of the challenges faced and present a case study.
Traditionally, risk scoring frameworks are built around the customer journey to identify non-compliant or fraudulent behaviour. These frameworks combine data from different sources and historical known fraud to identify high risk transactions or applications. In the public sector, however, the emphasis is often on identifying low-risk customers. In this talk we will discuss an Assurance Scoring framework which applies these traditional machine learning and analytics techniques but changes this emphasis and identifies those customers posing minimum risk. The advantage of this approach is that low risk transactions can be automated (which account for the majority of customers) and resources can be focused more effectively to handle those exceptional high risk cases. This framework has been developed in Python, in particular with Pandas and Scikit-Learn. But we also go beyond Machine Learning to incorporate other techniques such as rules based linking, anomaly detection and graph based analysis, and show how these can be used to boost the confidence of the low-risk group. In particular, we will showcase how different python packages have been integrated to address the data pre-processing, feature engineering, model building & validation problems and how we have solved the challenges faced during the integration process by developing a range of testing procedures.