Learn Motivation and Analogy of Reducing Dimensions | Introduction to Dimensionality Reduction

Imagine trying to find your way in a city with a map that has too many unnecessary details. Dimensionality reduction helps simplify data, making it easier to analyze and visualize. In machine learning, reducing dimensions can speed up computation and help models generalize better.


              123456789101112131415161718192021222324
            
import pandas as pd
import matplotlib.pyplot as plt

# Create a simple dataset with three columns
data = pd.DataFrame({
    "Height": [150, 160, 170, 180, 190],
    "Weight": [50, 60, 70, 80, 90],
    "Age": [20, 25, 30, 35, 40]
})

# Scatter plot using all three features (by color-coding Age)
plt.scatter(data["Height"], data["Weight"], c=data["Age"], cmap="viridis")
plt.xlabel("Height")
plt.ylabel("Weight")
plt.title("Scatter Plot with Age as Color")
plt.colorbar(label="Age")
plt.show()

# Now reduce to just Height and Weight
plt.scatter(data["Height"], data["Weight"])
plt.xlabel("Height")
plt.ylabel("Weight")
plt.title("Scatter Plot (Reduced: Height vs Weight)")
plt.show()

Analogy: think of dimensionality reduction as decluttering your workspace - removing items you don't need so you can focus on what's important. Just as clearing unnecessary clutter helps you work more efficiently, reducing irrelevant features in your data allows you to analyze and visualize the most meaningful information more easily.

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 1

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Swipe to show menu


              123456789101112131415161718192021222324
            
import pandas as pd
import matplotlib.pyplot as plt

# Create a simple dataset with three columns
data = pd.DataFrame({
    "Height": [150, 160, 170, 180, 190],
    "Weight": [50, 60, 70, 80, 90],
    "Age": [20, 25, 30, 35, 40]
})

# Scatter plot using all three features (by color-coding Age)
plt.scatter(data["Height"], data["Weight"], c=data["Age"], cmap="viridis")
plt.xlabel("Height")
plt.ylabel("Weight")
plt.title("Scatter Plot with Age as Color")
plt.colorbar(label="Age")
plt.show()

# Now reduce to just Height and Weight
plt.scatter(data["Height"], data["Weight"])
plt.xlabel("Height")
plt.ylabel("Weight")
plt.title("Scatter Plot (Reduced: Height vs Weight)")
plt.show()

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 1