Course Content
Classification with Python
Classification with Python
Challenge: Classifying Unseparateble Data
In this Challenge, you are given the following dataset:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
Here is its plot.
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y'])
The dataset is for sure not linearly separable. Let's look at the Logistic Regression performance:
import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) print(cross_val_score(lr, X, y).mean())
The result is awful. Regular Logistic Regression is not suited for this task. Your task is to check whether the PolynomialFeatures
will help. To find the best C
parameter, you will use the GridSearchCV
class.
In this challenge, the Pipeline
is used. You can think of it as a list of preprocessing steps. Its .fit_transform()
method sequentially applies .fit_transform()
to each item.
Swipe to show code editor
Build a Logistic Regression model with polynomial features and find the best C
parameter using GridSearchCV
- Create a pipeline to make an
X_poly
variable that will hold the polynomial features of degree 2 ofX
and be scaled. - Create a
param_grid
dictionary to tell theGridSearchCV
you want to try values[0.01, 0.1, 1, 10, 100]
of aC
parameter. - Initialize and train a
GridSearchCV
object.
Thanks for your feedback!
Challenge: Classifying Unseparateble Data
In this Challenge, you are given the following dataset:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
Here is its plot.
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y'])
The dataset is for sure not linearly separable. Let's look at the Logistic Regression performance:
import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) print(cross_val_score(lr, X, y).mean())
The result is awful. Regular Logistic Regression is not suited for this task. Your task is to check whether the PolynomialFeatures
will help. To find the best C
parameter, you will use the GridSearchCV
class.
In this challenge, the Pipeline
is used. You can think of it as a list of preprocessing steps. Its .fit_transform()
method sequentially applies .fit_transform()
to each item.
Swipe to show code editor
Build a Logistic Regression model with polynomial features and find the best C
parameter using GridSearchCV
- Create a pipeline to make an
X_poly
variable that will hold the polynomial features of degree 2 ofX
and be scaled. - Create a
param_grid
dictionary to tell theGridSearchCV
you want to try values[0.01, 0.1, 1, 10, 100]
of aC
parameter. - Initialize and train a
GridSearchCV
object.
Thanks for your feedback!
Challenge: Classifying Unseparateble Data
In this Challenge, you are given the following dataset:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
Here is its plot.
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y'])
The dataset is for sure not linearly separable. Let's look at the Logistic Regression performance:
import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) print(cross_val_score(lr, X, y).mean())
The result is awful. Regular Logistic Regression is not suited for this task. Your task is to check whether the PolynomialFeatures
will help. To find the best C
parameter, you will use the GridSearchCV
class.
In this challenge, the Pipeline
is used. You can think of it as a list of preprocessing steps. Its .fit_transform()
method sequentially applies .fit_transform()
to each item.
Swipe to show code editor
Build a Logistic Regression model with polynomial features and find the best C
parameter using GridSearchCV
- Create a pipeline to make an
X_poly
variable that will hold the polynomial features of degree 2 ofX
and be scaled. - Create a
param_grid
dictionary to tell theGridSearchCV
you want to try values[0.01, 0.1, 1, 10, 100]
of aC
parameter. - Initialize and train a
GridSearchCV
object.
Thanks for your feedback!
In this Challenge, you are given the following dataset:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
Here is its plot.
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y'])
The dataset is for sure not linearly separable. Let's look at the Logistic Regression performance:
import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) print(cross_val_score(lr, X, y).mean())
The result is awful. Regular Logistic Regression is not suited for this task. Your task is to check whether the PolynomialFeatures
will help. To find the best C
parameter, you will use the GridSearchCV
class.
In this challenge, the Pipeline
is used. You can think of it as a list of preprocessing steps. Its .fit_transform()
method sequentially applies .fit_transform()
to each item.
Swipe to show code editor
Build a Logistic Regression model with polynomial features and find the best C
parameter using GridSearchCV
- Create a pipeline to make an
X_poly
variable that will hold the polynomial features of degree 2 ofX
and be scaled. - Create a
param_grid
dictionary to tell theGridSearchCV
you want to try values[0.01, 0.1, 1, 10, 100]
of aC
parameter. - Initialize and train a
GridSearchCV
object.