Data Visualization
Now that we've covered the key features of the nltk
package, let's move on to visualizing our data. We'll begin by calculating word frequencies and then displaying these frequencies using a bar plot.
Tehtävä
Swipe to start coding
- Calculate the frequency distribution of the tokenized words from your text.
- Find the top 30 most common words in this distribution.
Ratkaisu
# Import matplotlib for data visualization
import matplotlib.pyplot as plt
# Create a subplot for a bar chart
fig, axs = plt.subplots(nrows=1, ncols=1, figsize=(18,5),
gridspec_kw={"height_ratios": [1],
"hspace": 0.7})
# Calculate frequency distribution of the tokenized words
fdist = nltk.FreqDist(story_tokenized)
# Find the top 30 most common words
top_30_words = fdist.most_common(30)
# Plot a bar chart for the top 30 words
axs.bar([word[0] for word in top_30_words], [word[1] for word in top_30_words])
# Add text labels for each bar with the count of the words
for i in range(len(top_30_words)):
axs.text(i, top_30_words[i][1], str(top_30_words[i][1]))
axs.set_xticklabels([word[0] for word in top_30_words], rotation=45)
axs.set_title("Top 30 Words")
axs.set_xlabel("Word")
axs.set_ylabel("Count of Words")
plt.show()
Mark tasks as Completed
Oliko kaikki selvää?
Kiitos palautteestasi!
Osio 1. Luku 10
AVAILABLE TO ULTIMATE ONLY