Challenge: Tokenizing Using Regex
Tarefa
Swipe to start coding
Given a string named message
, convert it lowercase, then tokenize it into words using regular expression tokenization and the corresponding nltk
class. A word is a sequence of only alphanumeric characters (letters and numbers). '#Conference2023!'
, for example, contains one word: Conference2023
.
Solução
99
1
2
3
4
5
6
7
8
9
10
# Import the necessary class from NLTK
from nltk.tokenize import RegexpTokenizer
message = "Amazing event at #Conference2023! Over 1000 attendees from 20+ countries. #Networking #Tech"
# Convert the message to lowercase
message = message.lower()
# Define a tokenizer that splits the text into words
word_tokenizer = RegexpTokenizer(r'\w+')
# Tokenize the text
words = word_tokenizer.tokenize(message)
print(words)
Tudo estava claro?
Obrigado pelo seu feedback!
Seção 1. Capítulo 6
99
1
2
3
4
5
6
7
8
9
10
# Import the necessary class from NLTK
from ___ import ___
message = "Amazing event at #Conference2023! Over 1000 attendees from 20+ countries. #Networking #Tech"
# Convert the message to lowercase
message = ___
# Define a tokenizer that splits the text into words
word_tokenizer = ___
# Tokenize the text
words = ___
print(words)
Pergunte à IA
Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo