Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Challenge: Creating a Bag of Words | Basic Text Models
Introduction to NLP

book
Challenge: Creating a Bag of Words

Uppgift

Swipe to start coding

Your task is to display the vector for the 'graphic design' bigram in a BoW model:

  1. Import the CountVectorizer class to create a BoW model.

  2. Instantiate the CountVectorizer class as count_vectorizer, configuring it for a frequency-based model that includes both unigrams and bigrams.

  3. Utilize the appropriate method of count_vectorizer to generate a BoW matrix from the 'Document' column in the corpus.

  4. Convert bow_matrix to a dense array and create a DataFrame from it, setting the unique features (unigrams and bigrams) as its columns. Assign this to the variable bow_df.

  5. Display the vector for 'graphic design' as an array, rather than as a pandas Series.

Lösning

# Import CountVectorizer
from sklearn.feature_extraction.text import CountVectorizer
import pandas as pd
corpus = pd.read_csv(
'https://content-media-cdn.codefinity.com/courses/c68c1f2e-2c90-4d5d-8db9-1e97ca89d15e/section_3/chapter_4/example_corpus.csv')
# Instantiate CountVectorizer
count_vectorizer = CountVectorizer(ngram_range=(1, 2))
# Generate a BoW matrix
bow_matrix = count_vectorizer.fit_transform(corpus['Document'])
# Convert the resulting matrix to a DataFrame
bow_df = pd.DataFrame(bow_matrix.toarray(), columns=count_vectorizer.get_feature_names_out())
# Print the vector for "graphic design"
print(bow_df['graphic design'].values)

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 3. Kapitel 5
# Import CountVectorizer
from ___ import ___
import pandas as pd
corpus = pd.read_csv(
'https://content-media-cdn.codefinity.com/courses/c68c1f2e-2c90-4d5d-8db9-1e97ca89d15e/section_3/chapter_4/example_corpus.csv')
# Instantiate CountVectorizer
count_vectorizer = ___
# Generate a BoW matrix
bow_matrix = count_vectorizer.___
# Convert the resulting matrix to a DataFrame
bow_df = pd.___(bow_matrix.___, columns=___)
# Print the vector for "graphic design"
print(___)

Fråga AI

expand
ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

some-alt