Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Challenge: Tokenizing Using Regex | Text Preprocessing Fundamentals
Introduction to NLP

book
Challenge: Tokenizing Using Regex

Завдання

Swipe to start coding

Given a string named message, convert it lowercase, then tokenize it into words using regular expression tokenization and the corresponding nltk class. A word is a sequence of only alphanumeric characters (letters and numbers). '#Conference2023!', for example, contains one word: Conference2023.

Рішення

# Import the necessary class from NLTK
from nltk.tokenize import RegexpTokenizer
message = "Amazing event at #Conference2023! Over 1000 attendees from 20+ countries. #Networking #Tech"
# Convert the message to lowercase
message = message.lower()
# Define a tokenizer that splits the text into words
word_tokenizer = RegexpTokenizer(r'\w+')
# Tokenize the text
words = word_tokenizer.tokenize(message)
print(words)

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 6
# Import the necessary class from NLTK
from ___ import ___
message = "Amazing event at #Conference2023! Over 1000 attendees from 20+ countries. #Networking #Tech"
# Convert the message to lowercase
message = ___
# Define a tokenizer that splits the text into words
word_tokenizer = ___
# Tokenize the text
words = ___
print(words)
toggle bottom row
some-alt