Seksjon 4. Kapittel 3
single
Challenge: Clean Messy Reviews
Sveip for å vise menyen
Oppgave
Swipe to start coding
You are given a list of customer review texts in the variable reviews.
The reviews may contain emojis, hashtags, repeated characters, noise words, punctuation, and informal expressions.
Your goal is to create a normalized version of each review using several NLP cleaning steps.
Follow these steps:
- Convert each review to lowercase.
- Remove emojis, hashtags, and mentions using a regular expression.
- Normalize repeated characters: any character repeated 3 or more times should be reduced to a single instance (
coooool→cool). - Tokenize each review using
nltk.word_tokenize(). - Remove stopwords using the provided
stopwordslist. - Apply stemming to the remaining tokens using
PorterStemmer. - Store each cleaned review (joined back with spaces) in a list named
cleaned_reviews.
Make sure the variable cleaned_reviews is declared and contains all normalized reviews in the correct order.
Løsning
Alt var klart?
Takk for tilbakemeldingene dine!
Seksjon 4. Kapittel 3
single
Spør AI
Spør AI
Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår