Challenge: Creating a Complete ML Pipeline
Now let's create a proper pipeline with the final estimator. As a result, we will get a trained prediction pipeline that can be used for predicting new instances simply by calling the .predict()
method.
To train a predictor (model), you need y
to be encoded. This is done separately from the pipeline we build for X
. Remember that LabelEncoder
is used for encoding the target.
Oppgave
Swipe to start coding
You have the same penguins dataset. The task is to build a pipeline with KNeighborsClassifier
as a final estimator, train it, and predict values for the X
itself.
- Encode the
y
variable. - Create a pipeline containing
ct
,SimpleImputer
,StandardScaler
, andKNeighborsClassifier
. - Train the
pipe
object using the featuresX
and the targety
.
Løsning
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.neighbors import KNeighborsClassifier
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins.csv')
# Removing rows with more than 1 null
df = df[df.isna().sum(axis=1) < 2]
# Assigining X, y variables
X, y = df.drop('species', axis=1), df['species']
# Encode the target
label_enc = LabelEncoder()
y = label_enc.fit_transform(y)
# Create the ColumnTransformer for encoding features
ct = make_column_transformer((OneHotEncoder(), ['island', 'sex']),
remainder='passthrough')
# Make a Pipeline of ct, SimpleImputer, and StandardScaler
pipe = make_pipeline(ct,
SimpleImputer(strategy='most_frequent'),
StandardScaler(),
KNeighborsClassifier()
)
# Train the model
pipe.fit(X, y)
# Print predictions
y_pred = pipe.predict(X) # Get encoded predictions
print(label_enc.inverse_transform(y_pred)) # Decode predictions and print
Alt var klart?
Takk for tilbakemeldingene dine!
Seksjon 3. Kapittel 6
single
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.neighbors import KNeighborsClassifier
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins.csv')
# Removing rows with more than 1 null
df = df[df.isna().sum(axis=1) < 2]
# Assigining X, y variables
X, y = df.drop('species', axis=1), df['species']
# Encode the target
label_enc = LabelEncoder()
y = ___
# Create the ColumnTransformer for encoding features
ct = make_column_transformer((OneHotEncoder(), ['island', 'sex']),
remainder='passthrough')
# Make a Pipeline of ct, SimpleImputer, and StandardScaler
pipe = ___(___, ___(___=___), ___(), __())
# Train the model
___
# Print predictions
y_pred = pipe.predict(X) # Get encoded predictions
print(label_enc.inverse_transform(y_pred)) # Decode predictions and print
Spør AI
Spør AI
Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår