Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Challenge: Creating Word Embeddings | Word Embeddings
Introduction to NLP

book
Challenge: Creating Word Embeddings

Tarefa

Swipe to start coding

Now, it's time for you to train a Word2Vec model to generate word embeddings for the given corpus:

  1. Import the class for creating a Word2Vec model.
  2. Tokenize each sentence in the 'Document' column of the corpus by splitting each sentence into words separated by whitespaces. Store the result in the sentences variable.
  3. Initialize the Word2Vec model by passing sentences as the first argument and setting the following values as keyword arguments, in this order:
    • embedding size: 50;
    • context window size: 2;
    • minimal frequency of words to include in the model: 1;
    • model: skip-gram.
  4. Print the top-3 most similar words to the word 'bowl'.

Solução

# Import the class for creating a Word2Vec model
from gensim.models import Word2Vec
import pandas as pd
corpus = pd.read_csv(
'https://content-media-cdn.codefinity.com/courses/c68c1f2e-2c90-4d5d-8db9-1e97ca89d15e/section_3/chapter_4/example_corpus.csv')
# Tokenize each of the sentence
sentences = corpus['Document'].str.split()
# Initialize the model
model = Word2Vec(sentences, vector_size=50, window=2, min_count=1, sg=1)
# Print top-3 most similar words to 'bowl'
print(model.wv.most_similar('bowl', topn=3))

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 4. Capítulo 4
# Import the class for creating a Word2Vec model
from ___ import ___
import pandas as pd
corpus = pd.read_csv(
'https://content-media-cdn.codefinity.com/courses/c68c1f2e-2c90-4d5d-8db9-1e97ca89d15e/section_3/chapter_4/example_corpus.csv')
# Tokenize each of the sentence
sentences = ___
# Initialize the model
model = ___
# Print top-3 most similar words to 'bowl'
print(___)

Pergunte à IA

expand
ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

some-alt