Class Imbalance Problem in Classification Domain

Class Imbalance is the problem of Machine Learning [Classification Models], when the data is skewed on the target variable, means that there is more data for 1 class [i.e. Majority class] and less for another [i.e. Minority Class], lets say in the ratio of 90:10 can be solved using:

1. Under Sampling Approach [on Majority Class]
2. Over Sampling Approach [on Minority Class]


***Here we will be taking about OverSampling Approach

Above chart is for Given Training Data, it clearly shows data for 2 classes highlighted by colors Orange [1] and Blue [0].

Its evident that data is heavily skewed means too much less data for Orange Dots as compared to Blue Dots.

We can OverSample the Minority Class here using SMOTE.

OverSampling Techniques are:
1. ROSE (Random Over-Sampling Examples)
2. SMOTE (Synthetic Minority Oversampling Technique)
  - It generates the same number of synthetic samples for each original minority sample.
3. ADASYN: ADAptive SYNthetic
  - It uses a density distribution, as a criterion to automatically decide the number of synthetic samples that must be generated for each minority sample by adaptively changing the weights of the different minority samples to compensate for the skewed distributions.

Below Diagram shows SMOTE

Python Code for Over Sampling via SMOTE

#Class to perform over-sampling using SMOTE

from imblearn.over_sampling import SMOTE, SMOTEN, SMOTENC


#also read about:

SMOTEN : Over-sample using the SMOTE variant specifically for categorical
    features only.

SMOTENC : Over-sample using SMOTE for continuous and categorical features.

# creating the object of SMOTE

oversample = SMOTEN(sampling_strategy="minority", k_neighbors=5, random_state=100, n_jobs=-1)


#transform or resample/oversample the data

data_X, data_Y = oversample.fit_resample(X, Y)

Class Imbalance Problem in Classification Domain

Class Imbalance Problem in Classification Domain

Python Code for Over Sampling via SMOTE

Like this:

Related

Rahul Aggarwal

http://guardiancoder.in

Leave a ReplyCancel reply

Class Imbalance Problem in Classification Domain

Python Code for Over Sampling via SMOTE

Share this post:

Like this:

Related

Rahul Aggarwal

http://guardiancoder.in

Leave a ReplyCancel reply

Discover more from Rahul Aggarwal's EdTech