Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara Challenge 3: Pipelines | Scikit-learn
Data Science Interview Challenge

book
Challenge 3: Pipelines

Pipelines play a crucial role in streamlining machine learning workflows, ensuring the coherent and efficient transition of data from one processing stage to another. Essentially, a pipeline bundles together a sequence of data processing steps and modeling into a single, unified structure. The primary advantage of using pipelines is the minimization of common workflow errors, such as data leakage when standardizing or normalizing data.

Compito

Swipe to start coding

Apply data scaling to the wine dataset, and then use the KMeans algorithm for clustering wines based on their chemical properties.

  1. Apply data standard scaling to the features of the wine dataset.
  2. Use the KMeans algorithm to cluster wines based on their chemical properties. You need 3 clusters.
  3. Apply the pipeline to the data

Soluzione

import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.pipeline import Pipeline
import warnings

# Ignore warnings
warnings.filterwarnings('ignore')

# Load Wine dataset
wine = load_wine()
X = wine.data

# 1. Create a pipeline that first applies standard scaling and then KMeans clustering
pipeline = Pipeline([
('scaler', StandardScaler()),
('kmeans', KMeans(n_clusters=3, random_state=0))
])

# 2. Apply the pipeline to the data
clusters = pipeline.fit_predict(X)

# Result distribution
plt.title('Cluster assignments:')
sns.countplot(x=clusters)
plt.show()

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 7. Capitolo 3
single

single

import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.pipeline import Pipeline
import warnings

# Ignore warnings
warnings.filterwarnings('ignore')

# Load Wine dataset
wine = load_wine()
X = wine.data

# 1. Create a pipeline that first applies standard scaling and then KMeans clustering
pipeline = Pipeline([
('scaler', ___()), # 1. Initialize the scaler
('kmeans', ___(n_clusters=___, random_state=0)) # Create the clustering algorithm with 3 clusters
])

# 2. Apply the pipeline to the data
clusters = pipeline.___(X)

# Result distribution
plt.title('Cluster assignments:')
sns.countplot(x=clusters)
plt.show()

Chieda ad AI

expand

Chieda ad AI

ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

some-alt