Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Deciding the Number of Clusters | Hierarchical Clustering
Cluster Analysis in Python

book
Deciding the Number of Clusters

Well done! Let's look one more time at all the dendrograms for the weather data.

As you can see, the single linkage method's dendrogram is unreadable. The average linkage method most likely led us to three clusters (if you draw a horizontal line between 75 and 100 you will intersect one blue line, and two green. The complete and ward linkages methods lead us to 4 clusters. For complete linkage, you can draw the horizontal line between 120 and 150 (it will intersect two orange and two green lines), and between 400 and 600 for ward linkage. Let's see what will be the results of using three clusters with average linkage.

Note, that in the previous sections we considered the cases of 5 or 4 clusters. Let's see how it will work now.

Aufgabe

Swipe to start coding

Table
  1. Import the AgglomerativeClustering function from sklearn.cluster.
  2. Create AgglomerativeClustering model object named model with 3 clusters and using 'average' linkage.
  3. Fit the numerical data (columns 3 - 14) to model and predict the labels. Save predicted labels as the 'prediction' column of data.
  4. For modified DataFrame monthly_data group the observations of columns from col by 'prediction' column, and calculate the mean within each group.
  5. Build line plot 'Month' vs 'Temp' for each value of 'Group' using monthly_data DataFrame.

Lösung

# Import the librarires
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.cluster import AgglomerativeClustering

# Read the data
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/Cities+weather.csv', index_col = 0)

# Create the model
model = AgglomerativeClustering(n_clusters = 3, linkage = 'average')

# Fit the data and predict the labels
data['prediction'] = model.fit_predict(data.iloc[:,2:14])

# Extract the list of the columns
col = list(data.columns[2:14])
col.append('prediction')

# Calculate the monthly mean averages for each cluster
monthly_data = data[col].groupby('prediction').mean().stack().reset_index()

# Assign new column names
monthly_data.columns = ['Group', 'Month', 'Temp']

# Visualize the results
sns.lineplot(x = 'Month', y = 'Temp', hue = 'Group', data = monthly_data)
plt.show()

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 6
# Import the librarires
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
___

# Read the data
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/Cities+weather.csv', index_col = 0)

# Create the model
model = AgglomerativeClustering(n_clusters = ___, linkage = '___')

# Fit the data and predict the labels
data['prediction'] = model.___(data.___[:,2:14])

# Extract the list of the columns
col = list(data.columns[2:14])
col.append('prediction')

# Calculate the monthly mean averages for each cluster
monthly_data = data[col].groupby('___').___().stack().reset_index()

# Assign new column names
monthly_data.columns = ['Group', 'Month', 'Temp']

# Visualize the results
sns.lineplot(x = '___', y = '___', hue = '___', data = monthly_data)
plt.show()

Fragen Sie AI

expand
ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

some-alt