Comparing the Dynamics
That's an interesting result! The yearly average temperatures across clusters significantly differ for 3 of them (47.3, 60.9, and 79.24). It seems like a good split.
Now let's visualize the monthly dynamics of average temperatures across clusters, and compare the result with the 5 clusters by the K-Means algorithm. The respective line plot is below.
Taak
Swipe to start coding
Visualize the monthly temperature dynamics across clusters. Follow the next steps:
- Import
KMedoids
function fromsklearn_extra.cluster
. - Create a
KMedoids
object namedmodel
with 4 clusters. - Fit the 3-15 columns (these are not indices, but positions) of
data
tomodel
. - Add the
'prediction'
column todata
with predicted bymodel
labels. - Calculate the monthly averages using
data
and save the result within thed
DataFrame:
- Group the observations by the
'prediction'
column. - Calculate the mean values.
- Stack the columns into indices (already done).
- Reset the indices.
- Assign
['Group', 'Month', 'Temp']
as columns names ofd
. - Build
lineplot
with'Month'
on the x-axis,'Temp'
on the y-axis for each'Group'
ofd
DataFrame (i.e. separate line and color for each'Group'
).
Oplossing
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Import the libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn_extra.cluster import KMedoids
# Read the data
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/Cities+weather.csv', index_col = 0)
# Create model
model = KMedoids(n_clusters = 4)
# Fit the data to model
model.fit(data.iloc[:,2:-1])
# Add new column to DataFrame
data['prediction'] = model.predict(data.iloc[:,2:-1])
# Extract the list of the columns
col = list(data.columns[2:14])
col.append('prediction')
# Calculate the monthly mean averages for each cluster
d = data[col].groupby('prediction').mean().stack().reset_index()
# Assign new column names
d.columns = ['Group', 'Month', 'Temp']
# Visualize the results
sns.lineplot(x = 'Month', y = 'Temp', hue = 'Group', data = d)
plt.show()
Was alles duidelijk?
Bedankt voor je feedback!
Sectie 2. Hoofdstuk 6
single
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Import the libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from ___ import ___
# Read the data
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/Cities+weather.csv', index_col = 0)
# Create model
model = ___(___)
# Fit the data to model
___.___(data.iloc[:,2:-1])
# Add new column to DataFrame
data['prediction'] = ___.___(data.iloc[:,2:-1])
# Extract the list of the columns
col = list(data.columns[2:14])
col.append('prediction')
# Calculate the monthly mean averages for each cluster
d = data[col].___('prediction').___().stack().___()
# Assign new column names
d.___ = ['Group', 'Month', 'Temp']
# Visualize the results
sns.___(x = '___', y = '___', hue = '___', data = ___)
___
Vraag AI
Vraag AI
Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.