Challenge: Creating Word Embeddings
Tarefa
Swipe to start coding
Now, it's time for you to train a Word2Vec model to generate word embeddings for the given corpus:
- Import the class for creating a Word2Vec model.
- Tokenize each sentence in the
'Document'
column of thecorpus
by splitting each sentence into words separated by whitespaces. Store the result in thesentences
variable. - Initialize the Word2Vec model by passing
sentences
as the first argument and setting the following values as keyword arguments, in this order:- embedding size: 50;
- context window size: 2;
- minimal frequency of words to include in the model: 1;
- model: skip-gram.
- Print the top-3 most similar words to the word 'bowl'.
Solução
99
1
2
3
4
5
6
7
8
9
10
11
# Import the class for creating a Word2Vec model
from gensim.models import Word2Vec
import pandas as pd
corpus = pd.read_csv(
'https://content-media-cdn.codefinity.com/courses/c68c1f2e-2c90-4d5d-8db9-1e97ca89d15e/section_3/chapter_4/example_corpus.csv')
# Tokenize each of the sentence
sentences = corpus['Document'].str.split()
# Initialize the model
model = Word2Vec(sentences, vector_size=50, window=2, min_count=1, sg=1)
# Print top-3 most similar words to 'bowl'
print(model.wv.most_similar('bowl', topn=3))
Tudo estava claro?
Obrigado pelo seu feedback!
Seção 4. Capítulo 4
99
1
2
3
4
5
6
7
8
9
10
11
# Import the class for creating a Word2Vec model
from ___ import ___
import pandas as pd
corpus = pd.read_csv(
'https://content-media-cdn.codefinity.com/courses/c68c1f2e-2c90-4d5d-8db9-1e97ca89d15e/section_3/chapter_4/example_corpus.csv')
# Tokenize each of the sentence
sentences = ___
# Initialize the model
model = ___
# Print top-3 most similar words to 'bowl'
print(___)
Pergunte à IA
Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo