Dimensionality Reduction is an UnSupervised ML model that is used to reduce number of features by deriving new features from old.
Here we convert from HIGH Dimensional space to a LOW Dimensional space. Hence its also known as Compression of features.
Major Algorithms for Dimensionality Reduction are:
UMAP (Uniform Manifold Approximation and Projection)
Isomap (Isometric feature mapping)
It is a manifold learning algorithm
2 Major Applications of this domain are:
1. Big Data Visualization
2. Feature Extraction/Compression
Big Data Visualization using PCA vs t-SNE vs UMAP
Feature Extraction/Compression
PCA (Principal Component Analysis) is used to derive new features known as Principal Components i.e. PC1, PC2, etc. from the original features.
It focuses on capturing the direction of maximum variation in the data set
Few important things to remember that:
Normally, PC1 i.e. First Principal Component covers around 90% of explained variance present in original dataset.
Principal Components are Orthogonal [Perpendicular] to each other i.e. they are independent of each other
Feature scaling is mandatory before performing PCA
PCA can be applied only to numeric data.
Its NOT Feature Selection or Elimination. Its Feature Reduction/Compression.
Below diagrams shows the transformation of Original Features into Principal Components (PC1 and PC2)
Leave a Reply