Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Challenge: Classifying Unseparateble Data | Logistic Regression
Classification with Python
course content

Course Content

Classification with Python

Classification with Python

1. k-NN Classifier
2. Logistic Regression
3. Decision Tree
4. Random Forest
5. Comparing Models

bookChallenge: Classifying Unseparateble Data

In this Challenge, you are given the following dataset:

1234
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
copy

Here is its plot.

12345
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y'])
copy

The dataset is for sure not linearly separable. Let's look at the Logistic Regression performance:

123456789101112
import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) print(cross_val_score(lr, X, y).mean())
copy

The result is awful. Regular Logistic Regression is not suited for this task. Your task is to check whether the PolynomialFeatures will help. To find the best C parameter, you will use the GridSearchCV class.

In this challenge, the Pipeline is used. You can think of it as a list of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each item.

Task
test

Swipe to show code editor

Build a Logistic Regression model with polynomial features and find the best C parameter using GridSearchCV

  1. Create a pipeline to make an X_poly variable that will hold the polynomial features of degree 2 of X and be scaled.
  2. Create a param_grid dictionary to tell the GridSearchCV you want to try values [0.01, 0.1, 1, 10, 100] of a C parameter.
  3. Initialize and train a GridSearchCV object.

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 6
toggle bottom row

bookChallenge: Classifying Unseparateble Data

In this Challenge, you are given the following dataset:

1234
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
copy

Here is its plot.

12345
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y'])
copy

The dataset is for sure not linearly separable. Let's look at the Logistic Regression performance:

123456789101112
import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) print(cross_val_score(lr, X, y).mean())
copy

The result is awful. Regular Logistic Regression is not suited for this task. Your task is to check whether the PolynomialFeatures will help. To find the best C parameter, you will use the GridSearchCV class.

In this challenge, the Pipeline is used. You can think of it as a list of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each item.

Task
test

Swipe to show code editor

Build a Logistic Regression model with polynomial features and find the best C parameter using GridSearchCV

  1. Create a pipeline to make an X_poly variable that will hold the polynomial features of degree 2 of X and be scaled.
  2. Create a param_grid dictionary to tell the GridSearchCV you want to try values [0.01, 0.1, 1, 10, 100] of a C parameter.
  3. Initialize and train a GridSearchCV object.

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 6
toggle bottom row

bookChallenge: Classifying Unseparateble Data

In this Challenge, you are given the following dataset:

1234
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
copy

Here is its plot.

12345
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y'])
copy

The dataset is for sure not linearly separable. Let's look at the Logistic Regression performance:

123456789101112
import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) print(cross_val_score(lr, X, y).mean())
copy

The result is awful. Regular Logistic Regression is not suited for this task. Your task is to check whether the PolynomialFeatures will help. To find the best C parameter, you will use the GridSearchCV class.

In this challenge, the Pipeline is used. You can think of it as a list of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each item.

Task
test

Swipe to show code editor

Build a Logistic Regression model with polynomial features and find the best C parameter using GridSearchCV

  1. Create a pipeline to make an X_poly variable that will hold the polynomial features of degree 2 of X and be scaled.
  2. Create a param_grid dictionary to tell the GridSearchCV you want to try values [0.01, 0.1, 1, 10, 100] of a C parameter.
  3. Initialize and train a GridSearchCV object.

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

In this Challenge, you are given the following dataset:

1234
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
copy

Here is its plot.

12345
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y'])
copy

The dataset is for sure not linearly separable. Let's look at the Logistic Regression performance:

123456789101112
import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) print(cross_val_score(lr, X, y).mean())
copy

The result is awful. Regular Logistic Regression is not suited for this task. Your task is to check whether the PolynomialFeatures will help. To find the best C parameter, you will use the GridSearchCV class.

In this challenge, the Pipeline is used. You can think of it as a list of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each item.

Task
test

Swipe to show code editor

Build a Logistic Regression model with polynomial features and find the best C parameter using GridSearchCV

  1. Create a pipeline to make an X_poly variable that will hold the polynomial features of degree 2 of X and be scaled.
  2. Create a param_grid dictionary to tell the GridSearchCV you want to try values [0.01, 0.1, 1, 10, 100] of a C parameter.
  3. Initialize and train a GridSearchCV object.

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Section 2. Chapter 6
Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
We're sorry to hear that something went wrong. What happened?
some-alt