Leer Reducing Data to 2D/3D and Visualizing with Matplotlib

Veeg om het menu te tonen

Visualizing data with the first two or three principal components helps you spot patterns and clusters that are hidden in high-dimensional space. By projecting data onto these components, you can see groupings that reveal the dataset's structure. This is especially useful for datasets like Iris, where reducing to 2D or 3D makes it easier to distinguish between classes and understand the data visually.


              123456789101112131415161718192021222324252627282930313233343536
            
# 2D scatter plot of the first two principal components
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler

# Load and scale the data
data = load_iris()
X = data.data
X_scaled = StandardScaler().fit_transform(X)

# Fit PCA and transform to 2D
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

plt.figure(figsize=(8,6))
sns.scatterplot(x=X_pca[:,0], y=X_pca[:,1], hue=data.target, palette='Set1', s=60)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA - Iris Dataset (2D)')
plt.legend(title='Species')
plt.show()

# 3D visualization
from mpl_toolkits.mplot3d import Axes3D
pca_3d = PCA(n_components=3)
X_pca_3d = pca_3d.fit_transform(X_scaled)
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111, projection='3d')
scatter = ax.scatter(X_pca_3d[:,0], X_pca_3d[:,1], X_pca_3d[:,2], c=data.target, cmap='Set1', s=60)
ax.set_xlabel('PC1')
ax.set_ylabel('PC2')
ax.set_zlabel('PC3')
plt.title('PCA - Iris Dataset (3D)')
plt.show()

The 2D scatter plot shows how samples are distributed along the first two principal components, often revealing clusters corresponding to different classes. The 3D plot can provide even more separation if the third component adds significant variance. By visualizing the data in this way, you gain insights into how well PCA is capturing the essential structure of your dataset and whether further dimensionality reduction might be appropriate for your analysis.

Was alles duidelijk?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 11

Vraag AI

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Sectie 1. Hoofdstuk 11