Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Data Visualization | Identifying the Most Frequent Words in Text
Identifying the Most Frequent Words in Text

book
Data Visualization

Now that we've covered the key features of the nltk package, let's move on to visualizing our data. We'll begin by calculating word frequencies and then displaying these frequencies using a bar plot.

Tehtävä

Swipe to start coding

  1. Calculate the frequency distribution of the tokenized words from your text.
  2. Find the top 30 most common words in this distribution.

Ratkaisu

# Import matplotlib for data visualization
import matplotlib.pyplot as plt

# Create a subplot for a bar chart
fig, axs = plt.subplots(nrows=1, ncols=1, figsize=(18,5),
gridspec_kw={"height_ratios": [1],
"hspace": 0.7})

# Calculate frequency distribution of the tokenized words
fdist = nltk.FreqDist(story_tokenized)

# Find the top 30 most common words
top_30_words = fdist.most_common(30)

# Plot a bar chart for the top 30 words
axs.bar([word[0] for word in top_30_words], [word[1] for word in top_30_words])

# Add text labels for each bar with the count of the words
for i in range(len(top_30_words)):
axs.text(i, top_30_words[i][1], str(top_30_words[i][1]))
axs.set_xticklabels([word[0] for word in top_30_words], rotation=45)
axs.set_title("Top 30 Words")
axs.set_xlabel("Word")
axs.set_ylabel("Count of Words")
plt.show()

Mark tasks as Completed
Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 1. Luku 10
AVAILABLE TO ULTIMATE ONLY
some-alt