Challenge: Classifying Unseparateble Data
In this Challenge, you are given the following dataset:
1234import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
Here is its plot.
12345import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y'])
The dataset is for sure not linearly separable. Let's look at the Logistic Regression performance:
123456789101112import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) print(cross_val_score(lr, X, y).mean())
The result is awful. Regular Logistic Regression is not suited for this task. Your task is to check whether the PolynomialFeatures
will help. To find the best C
parameter, you will use the GridSearchCV
class.
In this challenge, the Pipeline
is used. You can think of it as a list of preprocessing steps. Its .fit_transform()
method sequentially applies .fit_transform()
to each item.
Swipe to start coding
Build a Logistic Regression model with polynomial features and find the best C
parameter using GridSearchCV
- Create a pipeline to make an
X_poly
variable that will hold the polynomial features of degree 2 ofX
and be scaled. - Create a
param_grid
dictionary to tell theGridSearchCV
you want to try values[0.01, 0.1, 1, 10, 100]
of aC
parameter. - Initialize and train a
GridSearchCV
object.
Solution
Thanks for your feedback!
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 3.57Awesome!
Completion rate improved to 3.57
Challenge: Classifying Unseparateble Data
In this Challenge, you are given the following dataset:
1234import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
Here is its plot.
12345import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y'])
The dataset is for sure not linearly separable. Let's look at the Logistic Regression performance:
123456789101112import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) print(cross_val_score(lr, X, y).mean())
The result is awful. Regular Logistic Regression is not suited for this task. Your task is to check whether the PolynomialFeatures
will help. To find the best C
parameter, you will use the GridSearchCV
class.
In this challenge, the Pipeline
is used. You can think of it as a list of preprocessing steps. Its .fit_transform()
method sequentially applies .fit_transform()
to each item.
Swipe to start coding
Build a Logistic Regression model with polynomial features and find the best C
parameter using GridSearchCV
- Create a pipeline to make an
X_poly
variable that will hold the polynomial features of degree 2 ofX
and be scaled. - Create a
param_grid
dictionary to tell theGridSearchCV
you want to try values[0.01, 0.1, 1, 10, 100]
of aC
parameter. - Initialize and train a
GridSearchCV
object.
Solution
Thanks for your feedback!
single
Awesome!
Completion rate improved to 3.57
Challenge: Classifying Unseparateble Data
Swipe to show menu
In this Challenge, you are given the following dataset:
1234import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
Here is its plot.
12345import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y'])
The dataset is for sure not linearly separable. Let's look at the Logistic Regression performance:
123456789101112import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) print(cross_val_score(lr, X, y).mean())
The result is awful. Regular Logistic Regression is not suited for this task. Your task is to check whether the PolynomialFeatures
will help. To find the best C
parameter, you will use the GridSearchCV
class.
In this challenge, the Pipeline
is used. You can think of it as a list of preprocessing steps. Its .fit_transform()
method sequentially applies .fit_transform()
to each item.
Swipe to start coding
Build a Logistic Regression model with polynomial features and find the best C
parameter using GridSearchCV
- Create a pipeline to make an
X_poly
variable that will hold the polynomial features of degree 2 ofX
and be scaled. - Create a
param_grid
dictionary to tell theGridSearchCV
you want to try values[0.01, 0.1, 1, 10, 100]
of aC
parameter. - Initialize and train a
GridSearchCV
object.
Solution
Thanks for your feedback!