Dimensionality Reduction

Dimensionality Reduction is an UnSupervised ML model that is used to reduce number of features by deriving new features from old. 

Here we convert from HIGH Dimensional space to a LOW Dimensional space. Hence its also known as Compression of features.
  • Major Algorithms for Dimensionality Reduction are:
    • Linear Methods:
      1. PCA (Principal Component Analysis)
        • Eigen Decomposition
        • SVD (Singular Value Decomposition)
      2. NMF (Non-Negative Matrix Factorization)
      3. ICA (Independent Component Analysis)
      4. LDA (Linear Discriminant Analysis)
      5. GDA (Generalized Discriminant Analysis)
    • Non-Linear Methods:
      1. MDS (MultiDimensional Scaling)
      2. t-SNE (t-Distributed Stochastic Neighbor Embedding)
      3. UMAP (Uniform Manifold Approximation and Projection)
      4. Isomap (Isometric feature mapping)
        1. It is a manifold learning algorithm
2 Major Applications of this domain are:
    1. Big Data Visualization
    2. Feature Extraction/Compression

Big Data Visualization using PCA vs t-SNE vs UMAP


Feature Extraction/Compression

PCA (Principal Component Analysis) is used to derive new features known as Principal Components i.e. PC1, PC2, etc. from the original features.

It focuses on capturing the direction of maximum variation in the data set

Few important things to remember that:
  1. Normally, PC1 i.e. First Principal Component covers around 90% of explained variance present in original dataset.
  2. Principal Components are Orthogonal [Perpendicular] to each other i.e. they are independent of each other
  3. Feature scaling is mandatory before performing PCA
  4. PCA can be applied only to numeric data.
  5. Its NOT Feature Selection or Elimination. Its Feature Reduction/Compression.

Below diagrams shows the transformation of Original Features into Principal Components (PC1 and PC2)
image source: medium.com

PC1 and PC2 are Orthogonal to each other
image source – analyticsvidhya.com/

Rahul Aggarwal
http://guardiancoder.in

Senior Data Scientist and Gen-AI Engineer #DataScience #AI #RNN #CNN #GenAI #ChatGPT #LLMs

Leave a Reply

Discover more from Rahul Aggarwal's EdTech

Subscribe now to keep reading and get access to the full archive.

Continue reading