Challenge: Creating Word Embeddings
Opgave
Swipe to start coding
Now, it's time for you to train a Word2Vec model to generate word embeddings for the given corpus:
- Import the class for creating a Word2Vec model.
- Tokenize each sentence in the
'Document'
column of thecorpus
by splitting each sentence into words separated by whitespaces. Store the result in thesentences
variable. - Initialize the Word2Vec model by passing
sentences
as the first argument and setting the following values as keyword arguments, in this order:- embedding size: 50;
- context window size: 2;
- minimal frequency of words to include in the model: 1;
- model: skip-gram.
- Print the top-3 most similar words to the word 'bowl'.
Løsning
99
1
2
3
4
5
6
7
8
9
10
11
# Import the class for creating a Word2Vec model
from gensim.models import Word2Vec
import pandas as pd
corpus = pd.read_csv(
'https://content-media-cdn.codefinity.com/courses/c68c1f2e-2c90-4d5d-8db9-1e97ca89d15e/section_3/chapter_4/example_corpus.csv')
# Tokenize each of the sentence
sentences = corpus['Document'].str.split()
# Initialize the model
model = Word2Vec(sentences, vector_size=50, window=2, min_count=1, sg=1)
# Print top-3 most similar words to 'bowl'
print(model.wv.most_similar('bowl', topn=3))
Var alt klart?
Tak for dine kommentarer!
Sektion 4. Kapitel 4
single
99
1
2
3
4
5
6
7
8
9
10
11
# Import the class for creating a Word2Vec model
from ___ import ___
import pandas as pd
corpus = pd.read_csv(
'https://content-media-cdn.codefinity.com/courses/c68c1f2e-2c90-4d5d-8db9-1e97ca89d15e/section_3/chapter_4/example_corpus.csv')
# Tokenize each of the sentence
sentences = ___
# Initialize the model
model = ___
# Print top-3 most similar words to 'bowl'
print(___)
Spørg AI
Spørg AI
Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat