Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Challenge: Stemming the Tokens | Stemming and Lemmatization
Introduction to NLP

book
Challenge: Stemming the Tokens

Tehtävä

Swipe to start coding

Your task is the following:

  1. Import Porter Stemmer.
  2. Convert text to lowercase.
  3. Tokenize the text string.
  4. Load English stop words.
  5. Filter out the stop words using list comprehension.
  6. Create a stemmer object.
  7. Stem the tokens using list comprehension.

Ratkaisu

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
# Import Porter Stemmer
from nltk.stem import PorterStemmer
import nltk
nltk.download('punkt_tab')
nltk.download('stopwords')
text = "Despite the pouring rain, the overwhelming sense of joy and accomplishment made the day unforgettable!"
# Convert the text to lowercase
text = text.lower()
# Tokenization
tokens = word_tokenize(text)
# Load English stop words
stop_words = set(stopwords.words('english'))
# Remove stop words (use list comprehension)
filtered_tokens = [token for token in tokens if token.lower() not in stop_words]
# Create a stemmer object
stemmer = PorterStemmer()
# Stemming (use list comprehension)
stemmed_tokens = [stemmer.stem(token) for token in filtered_tokens]
print("Stemmed tokens:", stemmed_tokens)

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 2. Luku 2
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
# Import Porter Stemmer
from ___ import ____
import nltk
nltk.download('punkt_tab')
nltk.download('stopwords')
text = "Despite the pouring rain, the overwhelming sense of joy and accomplishment made the day unforgettable!"
# Convert the text to lowercase
text = ___
# Tokenization
tokens = ___
# Load English stop words
stop_words = set(___)
# Remove stop words (use list comprehension)
filtered_tokens = [token for ___ in tokens if ___]
# Create a stemmer object
stemmer = ___
# Stemming (use list comprehension)
stemmed_tokens = [___ for token in ___]
print("Stemmed tokens:", stemmed_tokens)
toggle bottom row
some-alt