Challenge: Tokenizing Using Regex
Oppgave
Swipe to start coding
Given a string named message
, convert it lowercase, then tokenize it into words using regular expression tokenization and the corresponding nltk
class. A word is a sequence of only alphanumeric characters (letters and numbers). '#Conference2023!'
, for example, contains one word: Conference2023
.
Løsning
99
1
2
3
4
5
6
7
8
9
10
# Import the necessary class from NLTK
from nltk.tokenize import RegexpTokenizer
message = "Amazing event at #Conference2023! Over 1000 attendees from 20+ countries. #Networking #Tech"
# Convert the message to lowercase
message = message.lower()
# Define a tokenizer that splits the text into words
word_tokenizer = RegexpTokenizer(r'\w+')
# Tokenize the text
words = word_tokenizer.tokenize(message)
print(words)
Alt var klart?
Takk for tilbakemeldingene dine!
Seksjon 1. Kapittel 6
99
1
2
3
4
5
6
7
8
9
10
# Import the necessary class from NLTK
from ___ import ___
message = "Amazing event at #Conference2023! Over 1000 attendees from 20+ countries. #Networking #Tech"
# Convert the message to lowercase
message = ___
# Define a tokenizer that splits the text into words
word_tokenizer = ___
# Tokenize the text
words = ___
print(words)
Spør AI
Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår