Reducing Data to 2D/3D and Visualizing with Matplotlib
Visualizing data with the first two or three principal components helps you spot patterns and clusters that are hidden in high-dimensional space. By projecting data onto these components, you can see groupings that reveal the dataset's structure. This is especially useful for datasets like Iris, where reducing to 2D or 3D makes it easier to distinguish between classes and understand the data visually.
123456789101112131415161718192021222324252627282930313233343536# 2D scatter plot of the first two principal components import matplotlib.pyplot as plt import seaborn as sns from sklearn.decomposition import PCA from sklearn.datasets import load_iris from sklearn.preprocessing import StandardScaler # Load and scale the data data = load_iris() X = data.data X_scaled = StandardScaler().fit_transform(X) # Fit PCA and transform to 2D pca = PCA(n_components=2) X_pca = pca.fit_transform(X_scaled) plt.figure(figsize=(8,6)) sns.scatterplot(x=X_pca[:,0], y=X_pca[:,1], hue=data.target, palette='Set1', s=60) plt.xlabel('Principal Component 1') plt.ylabel('Principal Component 2') plt.title('PCA - Iris Dataset (2D)') plt.legend(title='Species') plt.show() # 3D visualization from mpl_toolkits.mplot3d import Axes3D pca_3d = PCA(n_components=3) X_pca_3d = pca_3d.fit_transform(X_scaled) fig = plt.figure(figsize=(8,6)) ax = fig.add_subplot(111, projection='3d') scatter = ax.scatter(X_pca_3d[:,0], X_pca_3d[:,1], X_pca_3d[:,2], c=data.target, cmap='Set1', s=60) ax.set_xlabel('PC1') ax.set_ylabel('PC2') ax.set_zlabel('PC3') plt.title('PCA - Iris Dataset (3D)') plt.show()
The 2D scatter plot shows how samples are distributed along the first two principal components, often revealing clusters corresponding to different classes. The 3D plot can provide even more separation if the third component adds significant variance. By visualizing the data in this way, you gain insights into how well PCA is capturing the essential structure of your dataset and whether further dimensionality reduction might be appropriate for your analysis.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 8.33
Reducing Data to 2D/3D and Visualizing with Matplotlib
Swipe to show menu
Visualizing data with the first two or three principal components helps you spot patterns and clusters that are hidden in high-dimensional space. By projecting data onto these components, you can see groupings that reveal the dataset's structure. This is especially useful for datasets like Iris, where reducing to 2D or 3D makes it easier to distinguish between classes and understand the data visually.
123456789101112131415161718192021222324252627282930313233343536# 2D scatter plot of the first two principal components import matplotlib.pyplot as plt import seaborn as sns from sklearn.decomposition import PCA from sklearn.datasets import load_iris from sklearn.preprocessing import StandardScaler # Load and scale the data data = load_iris() X = data.data X_scaled = StandardScaler().fit_transform(X) # Fit PCA and transform to 2D pca = PCA(n_components=2) X_pca = pca.fit_transform(X_scaled) plt.figure(figsize=(8,6)) sns.scatterplot(x=X_pca[:,0], y=X_pca[:,1], hue=data.target, palette='Set1', s=60) plt.xlabel('Principal Component 1') plt.ylabel('Principal Component 2') plt.title('PCA - Iris Dataset (2D)') plt.legend(title='Species') plt.show() # 3D visualization from mpl_toolkits.mplot3d import Axes3D pca_3d = PCA(n_components=3) X_pca_3d = pca_3d.fit_transform(X_scaled) fig = plt.figure(figsize=(8,6)) ax = fig.add_subplot(111, projection='3d') scatter = ax.scatter(X_pca_3d[:,0], X_pca_3d[:,1], X_pca_3d[:,2], c=data.target, cmap='Set1', s=60) ax.set_xlabel('PC1') ax.set_ylabel('PC2') ax.set_zlabel('PC3') plt.title('PCA - Iris Dataset (3D)') plt.show()
The 2D scatter plot shows how samples are distributed along the first two principal components, often revealing clusters corresponding to different classes. The 3D plot can provide even more separation if the third component adds significant variance. By visualizing the data in this way, you gain insights into how well PCA is capturing the essential structure of your dataset and whether further dimensionality reduction might be appropriate for your analysis.
Thanks for your feedback!