Challenge: Classifying Inseparable Data

You will use the following dataset with two features:


              1234
            
import pandas as pd

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
print(df.head())

If you run the code below and take a look at the resulting scatter plot, you'll see that the dataset is not linearly separable:


              123456
            
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
plt.scatter(df['X1'], df['X2'], c=df['y'])
plt.show()

Let's use cross-validation to evaluate a simple logistic regression on this data:


              123456789101112131415161718
            
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
X = df[['X1', 'X2']]
y = df['y']

X = StandardScaler().fit_transform(X)
lr = LogisticRegression().fit(X, y)

y_pred = lr.predict(X)
plt.scatter(df['X1'], df['X2'], c=y_pred)
plt.show()

print(f'Cross-validation accuracy: {cross_val_score(lr, X, y).mean():.2f}')

As you can see, regular Logistic Regression is not suited for this task. Using polynomial regression may help improve the model's performance. Additionally, employing GridSearchCV allows you to find the optimal C parameter for better accuracy.

This task also uses the Pipeline class. You can think of it as a sequence of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each step in the pipeline.

War alles klar?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 13

single

Swipe um das Menü anzuzeigen

You will use the following dataset with two features:


              1234
            
import pandas as pd

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
print(df.head())

If you run the code below and take a look at the resulting scatter plot, you'll see that the dataset is not linearly separable:


              123456
            
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
plt.scatter(df['X1'], df['X2'], c=df['y'])
plt.show()

Let's use cross-validation to evaluate a simple logistic regression on this data:


              123456789101112131415161718
            
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
X = df[['X1', 'X2']]
y = df['y']

X = StandardScaler().fit_transform(X)
lr = LogisticRegression().fit(X, y)

y_pred = lr.predict(X)
plt.scatter(df['X1'], df['X2'], c=y_pred)
plt.show()

print(f'Cross-validation accuracy: {cross_val_score(lr, X, y).mean():.2f}')

As you can see, regular Logistic Regression is not suited for this task. Using polynomial regression may help improve the model's performance. Additionally, employing GridSearchCV allows you to find the optimal C parameter for better accuracy.

This task also uses the Pipeline class. You can think of it as a sequence of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each step in the pipeline.

Aufgabe

Wischen, um mit dem Codieren zu beginnen

You are given a dataset described as a DataFrame in the df variable.

Create a pipeline that will hold the polynomial features of degree 2 of X and be scaled and store the resulting pipeline in the pipe variable.
Create a param_grid dictionary to with values [0.01, 0.1, 1, 10, 100] of the C hyperparameter.
Initialize and train a GridSearchCV object and store the trained object in the grid_cv variable.

Lösung

Wechseln Sie zum Desktop, um in der realen Welt zu übenFahren Sie dort fort, wo Sie sind, indem Sie eine der folgenden Optionen verwenden

War alles klar?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 13

single

Fragen Sie AI

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen