Understanding Word2Vec: Creating Dense Vectors for Neural Networks
Word2Vec is a technique used to create dense vectors to represent words in neural networks. By distinguishing target and context words, the network input and output layers are defined. Through training, the neural network predicts target words and minimizes loss. The hidden layer's neuron count determines vector embedding dimension.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Word2Vec: Creating Dense Vectors James Pustejovsky CS 114B - Spring 2023 Brandeis University
Distinguishing Target and Context Words Imagine that we are sliding the target word position from left to right: Context := life, Target+1 := is, Target+2 := like, Context := like, Target-2 := life, Context-1 := is, Context+1 := riding,Context+2 := a
We continue through the entire sequence Context := riding, Target-2 := is, Target-1 := like, Target+1 := a,Target+2 := bicycle
The Network Input V = 10,000, Embedding dimension = 300, Output V = 10,000
The Input Layer As mentioned above, both the source and the target are the words picked from the corpus for the training of the neural network. The way each word is represented as input is using a one-hot encoded approach. If there are 10,000 unique words in the corpus, the input is an array of length 10,000 where only one of them is 1 and rest are zero for any word.
The Output Layer The output of the neural network is the probability of a particular word being nearby for the given input word. If there are 10,000 words in the corpus, the output layer is an array of length 10,000; Each element is a value between 0 and 1 representing the probability of a target word occurring along with the context word
Training The neural network predicts a target word given an input word. The neural network takes the one-hot representation of input word as input and predicts the probability for each of the target words as output. For training the model and to minimize the loss, we need the actual values to compare with the predicted values and to calculate the loss. The one-hot representation of the target words will serve as the actual values of the output layer for this purpose. A Softmax layer is used at the output of the neural network.
Hidden Layer The number of neurons in the hidden layer will define how (and to what dimension) we embed the 10,000 length sparse vectors. Let us take 50 as the dimension of the hidden layer. This means we are creating 50 features for each word that passes through the network.
The Hidden Layer Firstly, let us look at weights from the hidden layer perspective. For the first neuron in the hidden layer, there are 10,000 input connections one from each element of the input array. But, because the input layer is a one-hot representation, only one of input will be coming in as 1 and the rest of the inputs are zeroes. So, only one of the 10,000 weights will be passed on to the activation function for any word. This is true for all the 50 neurons in the hidden layer.
The Input Layer For any word, only one element of the input array is 1 out of 10,000 because of one-hot representation. For that input element, there are 50 connections (weights) to each of the 50 neurons in the hidden layer. When the next word comes as input, another input element will have a value of 1 and it will have its own 50 connections (weights) to the hidden layer. So, in a way, each word in the corpus will have its own set of 50 weights which will be used when that word appears in the context. These 50 weights are the word vectors that represent the words in the corpus.
Material for Slides thanks to Manoj Akella Dhruvil Karani