Comparing Model Performance Before and After PCA
PCA can be used as a preprocessing step before training machine learning models. In this chapter, you will compare the performance of a LogisticRegression classifier on the original standardized data and on data reduced to two principal components. This practical approach highlights how dimensionality reduction can impact both the effectiveness and efficiency of your models.
123456789101112131415161718192021222324from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Split data into train and test sets X_train, X_test, y_train, y_test = train_test_split(X_scaled, data.target, test_size=0.3, random_state=42) # Train on original data clf_orig = LogisticRegression(max_iter=200) clf_orig.fit(X_train, y_train) y_pred_orig = clf_orig.predict(X_test) acc_orig = accuracy_score(y_test, y_pred_orig) # Train on PCA-reduced data (2 components) pca = PCA(n_components=2) X_train_pca = pca.fit_transform(X_train) X_test_pca = pca.transform(X_test) clf_pca = LogisticRegression(max_iter=200) clf_pca.fit(X_train_pca, y_train) y_pred_pca = clf_pca.predict(X_test_pca) acc_pca = accuracy_score(y_test, y_pred_pca) print(f"Accuracy on original data: {acc_orig:.2f}") print(f"Accuracy after PCA (2 components): {acc_pca:.2f}")
The code above splits the data, trains a logistic regression model on both the original and PCA-reduced data, and compares their accuracies. Notice that a perfect accuracy of 1.0 on the original data may indicate overfitting, where the model fits the training data too closely and may not generalize well. Applying PCA reduces dimensionality, which can help mitigate overfitting. After PCA, accuracy drops slightly to 0.91, showing a better balance between performance and generalization, with increased speed and interpretability.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you explain why the accuracy drops after applying PCA?
What are the benefits of using PCA before training a model?
How do I choose the number of principal components for PCA?
Awesome!
Completion rate improved to 8.33
Comparing Model Performance Before and After PCA
Swipe to show menu
PCA can be used as a preprocessing step before training machine learning models. In this chapter, you will compare the performance of a LogisticRegression classifier on the original standardized data and on data reduced to two principal components. This practical approach highlights how dimensionality reduction can impact both the effectiveness and efficiency of your models.
123456789101112131415161718192021222324from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Split data into train and test sets X_train, X_test, y_train, y_test = train_test_split(X_scaled, data.target, test_size=0.3, random_state=42) # Train on original data clf_orig = LogisticRegression(max_iter=200) clf_orig.fit(X_train, y_train) y_pred_orig = clf_orig.predict(X_test) acc_orig = accuracy_score(y_test, y_pred_orig) # Train on PCA-reduced data (2 components) pca = PCA(n_components=2) X_train_pca = pca.fit_transform(X_train) X_test_pca = pca.transform(X_test) clf_pca = LogisticRegression(max_iter=200) clf_pca.fit(X_train_pca, y_train) y_pred_pca = clf_pca.predict(X_test_pca) acc_pca = accuracy_score(y_test, y_pred_pca) print(f"Accuracy on original data: {acc_orig:.2f}") print(f"Accuracy after PCA (2 components): {acc_pca:.2f}")
The code above splits the data, trains a logistic regression model on both the original and PCA-reduced data, and compares their accuracies. Notice that a perfect accuracy of 1.0 on the original data may indicate overfitting, where the model fits the training data too closely and may not generalize well. Applying PCA reduces dimensionality, which can help mitigate overfitting. After PCA, accuracy drops slightly to 0.91, showing a better balance between performance and generalization, with increased speed and interpretability.
Thanks for your feedback!