Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Word Embeddings Basics
course content

Course Content

Introduction to NLP

Word Embeddings BasicsWord Embeddings Basics

Understanding Word Embeddings

Traditional text representation models like Bag of Words and TF-IDF have advanced natural language processing but have significant limitations. They fail to capture the semantic relationships between words due to treating each word independently of its context and produce high-dimensional, sparse matrices that are computationally inefficient for large text corpora.

Drawbacks

Word embeddings address these issues by considering the context in which words appear, providing a more nuanced understanding of language.

Word embeddings are dense vector representations of words in a continuous vector space where semantically similar words are mapped to proximate points.

Several models and techniques have been developed to generate effective word embeddings:

  • Word2Vec: This tool, created by Google researchers, transforms words into numerical vectors. It uses two methods: Continuous Bag of Words (CBoW), which predicts a word based on its context, and Skip-Gram, which does the opposite by predicting the surrounding context from a word.
  • GloVe: Developed by Stanford University, GloVe turns words into vectors using a different approach. It analyzes how often pairs of words occur together in the entire text corpus to learn about their relationships.
  • FastText: Created by Facebook AI Research, FastText improves upon Word2Vec by breaking down words into smaller parts called character n-grams. This allows the model to better handle morphologically rich languages and words not seen during training.
Models

In fact, Word2Vec and FastText are the most commonly used models for generating word embeddings. However, since FastText is essentially an enhanced version of Word2Vec, we will focus exclusively on Word2Vec in our course.

How Word2Vec Works?

Word2Vec transforms words into vectors using a process that starts with one-hot encoding, where each word in a vocabulary is represented by a unique vector marked by a single '1' among zeros. Let's take a look at an example:

One-hot encoding

This vector serves as input to a neural network, which is designed to 'learn' the word embeddings. The network's architecture can follow one of two models: CBoW (Continuous Bag of Words), which predicts a target word based on the context provided by surrounding words, or Skip-Gram, which, conversely, predicts the surrounding context words based on the target word.

In both Word2Vec architectures, during each training iteration, the model is provided with a target word and the words surrounding it as the context represented as one-hot encoded vectors. The training dataset is thus effectively composed of these pairs or groups, where each target word is associated with its surrounding context words.

Every word in the vocabulary takes a turn being the target as the model iterates through the text using a sliding context window technique. This technique systematically moves across every word, ensuring comprehensive learning from all possible contexts within the corpus.

A context window is a fixed-size span of words around a target word that the model uses to learn the word's context. Specifically, it dictates how many words before and after the target word are considered during the training process.

Let's take a look at an example with a window size equal to 2 to make make things clear:

Sliding window

A context window size of 2 means the model will include up to 2 words from both the left and the right of the target word, as long as those words are available within the sentence boundaries. As you can see, if there are fewer than 2 words on either side, the model will include as many words as are available.

question-icon

What does a context window of size 5 mean?

Select the correct answer

Everything was clear?

Section 4. Chapter 1
some-alt