Monday 12:35–13:05 in Track 3

Overview of imbalanced data prediction methods

Robert Kostrzewski

Audience level:
Intermediate

Description

Imbalance ratio is a definition describing relation of frequency of data classified to following classification classes. Assuming binary classification as datasets' domain, higher the ratio is, more disproportion on feature existence distribution is observed. The talk’s goal is to compare, in both theoretical and practical ways, various fresh methods of dealing with the problem.

Abstract

Intro

Imbalance ratio is a definition applicable to Machine Learning classification problem. It describes relation of frequency of data classified to following classes. Assuming binary classification as datasets domain, higher the ratio is, more disproportion on feature existence distribution is observed. The presentation describes various fresh methods of dealing with imbalance problem. With the support of theory explanation, definition’s papers references and experiments performed on real datasets there is a compare of the mentioned techniques performed.

Algorithms

The following algorithms are introduced during the talk:

Experiments

As a result of working on paper called Imbalanced data classification using MapReduce and relief , already mentioned algorithms have been compared in a way of experiments applicated to 11 datasets with various size and imbalance ratio.

Subscribe to Receive PyData Updates

Subscribe