PyData Warsaw 2018 - Presentation: Overview of imbalanced data prediction methods

Imbalance ratio is a definition describing relation of frequency of data classified to following classification classes. Assuming binary classification as datasets' domain, higher the ratio is, more disproportion on feature existence distribution is observed. The talk’s goal is to compare, in both theoretical and practical ways, various fresh methods of dealing with the problem.

Intro

Imbalance ratio is a definition applicable to Machine Learning classification problem. It describes relation of frequency of data classified to following classes. Assuming binary classification as datasets domain, higher the ratio is, more disproportion on feature existence distribution is observed. The presentation describes various fresh methods of dealing with imbalance problem. With the support of theory explanation, definition’s papers references and experiments performed on real datasets there is a compare of the mentioned techniques performed.

Algorithms

The following algorithms are introduced during the talk:

Splitbal - the technique based on splitting majority class subset, sub-classifiers creation and ensembling process. Described in A novel ensemble method for classifying imbalanced data
SMOTE - algorithm fully called Synthetic Minority Oversampling TEchnique. There is a method of oversampling minority class in a specific way. Described in SMOTE: Synthetic minority over-sampling technique
EDBC - the dissimilarity-based imbalance data classification method. Described in A dissimilarity-based imbalance data classification algorithm

Experiments

As a result of working on paper called Imbalanced data classification using MapReduce and relief , already mentioned algorithms have been compared in a way of experiments applicated to 11 datasets with various size and imbalance ratio.

Monday 12:35–13:05 in Track 3

Overview of imbalanced data prediction methods

Robert Kostrzewski

Description

Abstract

Intro

Algorithms

Experiments

Subscribe to Receive PyData Updates