Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Challenge: Classifying Inseparable Data | Section
Practice
Projects
Quizzes & Challenges
Quizze
Challenges
/
Classification with Python

bookChallenge: Classifying Inseparable Data

You will use the following dataset with two features:

1234
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
copy

If you run the code below and take a look at the resulting scatter plot, you'll see that the dataset is not linearly separable:

123456
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y']) plt.show()
copy

Let's use cross-validation to evaluate a simple logistic regression on this data:

123456789101112131415161718
import pandas as pd import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) y_pred = lr.predict(X) plt.scatter(df['X1'], df['X2'], c=y_pred) plt.show() print(f'Cross-validation accuracy: {cross_val_score(lr, X, y).mean():.2f}')
copy

As you can see, regular Logistic Regression is not suited for this task. Using polynomial regression may help improve the model's performance. Additionally, employing GridSearchCV allows you to find the optimal C parameter for better accuracy.

This task also uses the Pipeline class. You can think of it as a sequence of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each step in the pipeline.

Aufgabe

Swipe to start coding

You are given a dataset described as a DataFrame in the df variable.

  • Create a pipeline that will hold the polynomial features of degree 2 of X and be scaled and store the resulting pipeline in the pipe variable.
  • Create a param_grid dictionary to with values [0.01, 0.1, 1, 10, 100] of the C hyperparameter.
  • Initialize and train a GridSearchCV object and store the trained object in the grid_cv variable.

Lösung

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 13
single

single

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

close

bookChallenge: Classifying Inseparable Data

Swipe um das Menü anzuzeigen

You will use the following dataset with two features:

1234
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
copy

If you run the code below and take a look at the resulting scatter plot, you'll see that the dataset is not linearly separable:

123456
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y']) plt.show()
copy

Let's use cross-validation to evaluate a simple logistic regression on this data:

123456789101112131415161718
import pandas as pd import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) y_pred = lr.predict(X) plt.scatter(df['X1'], df['X2'], c=y_pred) plt.show() print(f'Cross-validation accuracy: {cross_val_score(lr, X, y).mean():.2f}')
copy

As you can see, regular Logistic Regression is not suited for this task. Using polynomial regression may help improve the model's performance. Additionally, employing GridSearchCV allows you to find the optimal C parameter for better accuracy.

This task also uses the Pipeline class. You can think of it as a sequence of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each step in the pipeline.

Aufgabe

Swipe to start coding

You are given a dataset described as a DataFrame in the df variable.

  • Create a pipeline that will hold the polynomial features of degree 2 of X and be scaled and store the resulting pipeline in the pipe variable.
  • Create a param_grid dictionary to with values [0.01, 0.1, 1, 10, 100] of the C hyperparameter.
  • Initialize and train a GridSearchCV object and store the trained object in the grid_cv variable.

Lösung

Switch to desktopWechseln Sie zum Desktop, um in der realen Welt zu übenFahren Sie dort fort, wo Sie sind, indem Sie eine der folgenden Optionen verwenden
War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 13
single

single

some-alt