Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Words Count | Tweet Sentiment Analysis
Tweet Sentiment Analysis

book
Words Count

Now we would like to explore the most represented number in our DataFrame. For this reason we will create a collection where we will store the most frequent words and then, plot it.

Methods description

  • from collections import Counter; import nltk: Imports the Counter class from the collections module and the nltk library;
  • from nltk.corpus import stopwords: Imports a list of common stopwords from NLTK;
  • nltk.download("stopwords"): Downloads the stopwords dataset from NLTK;
  • def remove_stopword(x): This defines a function named remove_stopword that takes a list x as input and returns a new list with stopwords removed;
  • return [y for y in x if y not in stopwords.words("english")]: This comprehension expression filters out stopwords from the input list x using the list of English stopwords from NLTK;
  • Counter: A class from the collections module used to count occurrences of elements in a list or iterable;
  • stopwords.words("english"): A method from NLTK that returns a list of stopwords for the English language;
  • temp.most_common(25): Returns the 25 most common elements (words) and their counts from the Counter object temp;
  • temp.iloc[1:,:]: Indexes a DataFrame temp to exclude the first row and select all columns;
  • temp.style.background_gradient(...): Applies a background gradient style to a DataFrame temp.
Oppgave

Swipe to start coding

Create a collection to count word occurrences using the Counter module:

  1. Remove stopwords from our tweets texts.
  2. Create a collection.
  3. Create a DataFrame with the newly created list.
  4. Change the background color to "Blues".

Løsning

from collections import Counter
import nltk
from nltk.corpus import stopwords
nltk.download("stopwords")

def remove_stopword(x):
return [y for y in x if y not in stopwords.words("english")]

data["temp_list1"] = data["text"].apply(lambda x: str(x).split()) #List of words in every row for text
data["temp_list1"] = data["temp_list1"].apply(lambda x: remove_stopword(x)) #Removing Stopwords

top = Counter([item for sublist in data["temp_list1"] for item in sublist])
temp = pd.DataFrame(top.most_common(25))
temp = temp.iloc[1:,:]
temp.columns = ["Common_words", "count"]
temp.style.background_gradient(cmap = "Blues")

Mark tasks as Completed
Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 8
AVAILABLE TO ULTIMATE ONLY
some-alt