Challenge: Classifying Inseparable Data
You will use the following dataset with two features:
1234import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
If you run the code below and take a look at the resulting scatter plot, you'll see that the dataset is not linearly separable:
123456import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y']) plt.show()
Let's use cross-validation to evaluate a simple logistic regression on this data:
123456789101112131415161718import pandas as pd import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) y_pred = lr.predict(X) plt.scatter(df['X1'], df['X2'], c=y_pred) plt.show() print(f'Cross-validation accuracy: {cross_val_score(lr, X, y).mean():.2f}')
As you can see, regular Logistic Regression is not suited for this task. Using polynomial regression may help improve the model's performance. Additionally, employing GridSearchCV allows you to find the optimal C parameter for better accuracy.
This task also uses the Pipeline class. You can think of it as a sequence of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each step in the pipeline.
Swipe to start coding
You are given a dataset described as a DataFrame in the df variable.
- Create a pipeline that will hold the polynomial features of degree 2 of
Xand be scaled and store the resulting pipeline in thepipevariable. - Create a
param_griddictionary to with values[0.01, 0.1, 1, 10, 100]of theChyperparameter. - Initialize and train a
GridSearchCVobject and store the trained object in thegrid_cvvariable.
Рішення
Дякуємо за ваш відгук!
single
Запитати АІ
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат
Чудово!
Completion показник покращився до 5.88
Challenge: Classifying Inseparable Data
Свайпніть щоб показати меню
You will use the following dataset with two features:
1234import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
If you run the code below and take a look at the resulting scatter plot, you'll see that the dataset is not linearly separable:
123456import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y']) plt.show()
Let's use cross-validation to evaluate a simple logistic regression on this data:
123456789101112131415161718import pandas as pd import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) y_pred = lr.predict(X) plt.scatter(df['X1'], df['X2'], c=y_pred) plt.show() print(f'Cross-validation accuracy: {cross_val_score(lr, X, y).mean():.2f}')
As you can see, regular Logistic Regression is not suited for this task. Using polynomial regression may help improve the model's performance. Additionally, employing GridSearchCV allows you to find the optimal C parameter for better accuracy.
This task also uses the Pipeline class. You can think of it as a sequence of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each step in the pipeline.
Swipe to start coding
You are given a dataset described as a DataFrame in the df variable.
- Create a pipeline that will hold the polynomial features of degree 2 of
Xand be scaled and store the resulting pipeline in thepipevariable. - Create a
param_griddictionary to with values[0.01, 0.1, 1, 10, 100]of theChyperparameter. - Initialize and train a
GridSearchCVobject and store the trained object in thegrid_cvvariable.
Рішення
Дякуємо за ваш відгук!
single