Comparing the Trends Across Clusters

Both cases look correct, or at least there is nothing wrong. Let's compare the dynamics across months for both cases.

You might remember from the previous sections how the algorithms predicted the dynamics. The spectral clustering with 4 clusters will predict the following dynamics.

# Import the librarires

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

from sklearn.cluster import SpectralClustering

# Read the data

data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/Cities+weather.csv', index_col = 0)

# Create the model

model = SpectralClustering(n_clusters = 4, affinity = 'nearest_neighbors')

# Fit the data and predict the labels

data['prediction'] = model.fit_predict(data.iloc[:,2:14])

# Extract the list of the columns

col = list(data.columns[2:14])

col.append('prediction')

# Calculate the monthly mean averages for each cluster

d = data[col].groupby('prediction').mean().stack().reset_index()

# Assign new column names

d.columns = ['Group', 'Month', "Temp"]

# Visualize the results

sns.lineplot(x = 'Month', y = "Temp", hue = 'Group', data = d)

plt.show()


              12345678910111213141516171819202122232425262728
            
# Import the librarires
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.cluster import SpectralClustering

# Read the data
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/Cities+weather.csv', index_col = 0)

# Create the model
model = SpectralClustering(n_clusters = 4, affinity = 'nearest_neighbors')

# Fit the data and predict the labels
data['prediction'] = model.fit_predict(data.iloc[:,2:14])

# Extract the list of the columns
col = list(data.columns[2:14])
col.append('prediction')

# Calculate the monthly mean averages for each cluster
d = data[col].groupby('prediction').mean().stack().reset_index()

# Assign new column names
d.columns = ['Group', 'Month', "Temp"]

# Visualize the results
sns.lineplot(x = 'Month', y = "Temp", hue = 'Group', data = d)
plt.show()

Quite an interesting result! The spectral clustering algorithm catches the 'downwards` up to summer dynamics even in the case of 4 clusters. Let's find out what will be the fifth line produced by this algorithm.

Compito

Swipe to start coding

Import SpectralClustering function from sklearn.cluster.
Create SpectralClustering model named model with 5 clusters and 'nearest_neighbors' affinity.
Fit the 3-14 columns of data to model and predict labels. Save them within 'prediction' column of data.
Calculate the mean for each month within the monthly_data variable:

Group the observation of col columns by the 'prediction' column.
Calculate the mean within each group.
Stack the table.
Reset the indices.

Reassign the column names of newly created DataFrame monthly_data to ['Group', 'Month', 'Temp'].
Build seaborn line plot 'Month' vs 'Temp' for each 'Group' value. Display the plot.

Soluzione

# Import the librarires

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

from sklearn.cluster import SpectralClustering

# Read the data

data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/Cities+weather.csv', index_col = 0)

# Create the model

model = SpectralClustering(n_clusters = 4, affinity = 'nearest_neighbors')

# Fit the data and predict the labels

data['prediction'] = model.fit_predict(data.iloc[:,2:14])

# Extract the list of the columns

col = list(data.columns[2:14])

col.append('prediction')

# Calculate the monthly mean averages for each cluster

monthly_data = data[col].groupby('prediction').mean().stack().reset_index()

# Assign new column names

monthly_data.columns = ['Group', 'Month', 'Temp']

# Visualize the results

sns.lineplot(x = 'Month', y = 'Temp', hue = 'Group', data = monthly_data)

plt.show()

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 4. Capitolo 6

single

# Import the librarires

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

___

# Read the data

data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/Cities+weather.csv', index_col = 0)

# Create the model

model = ___(___, ___)

# Fit the data and predict the labels

data['___'] = ___.___(___)

# Extract the list of the columns

col = list(data.columns[2:14])

col.append('prediction')

# Calculate the monthly mean averages for each cluster