eduzhai > Applied Sciences > Engineering >

Performance Analysis of Under-Sampling and Over-Sampling Techniques for Solving Class Imbalance Problem

  • Save

... pages left unread,continue reading

Document pages: 11 pages

Abstract: Most of the traditional classification algorithms assume their training data to be well-balanced in terms of class distribution. Real-world datasets, however, are imbalanced in nature thus degrade the performance of the traditional classifiers. An imbalance data-set typically make prediction accuracy difficult. Data pre-processing approaches discuss this issue by using random under-sampling or oversampling techniques. To solve this problem, many strategies are adopted to balance the class distribution at the data level. The data level methods balance the imbalance distribution between majority and minority classes using either oversampling or under-sampling techniques. In this paper, we present the performance analysis of under-sampling method and oversampling methods. The methods are implemented with 5 conventional classifiers like C4.5 Decision Tree (DT), k-Nearest Neighbor (k-NN), Multilayer Perceptron (MLP), Support Vector Machine (SVM), and Naive Bayes (NB) on 15 real life data sets. The experimental results show comparative study of under-sampling and over sampling technique.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...