Exploring Word Embeddings in Vision and Language: A Comprehensive Overview
Word embeddings play a crucial role in representing words as compact vectors. This comprehensive overview delves into the concept of word embeddings, discussing approaches like one-hot encoding, histograms of co-occurring words, and more advanced techniques like word2vec. The exploration covers topics such as distributional semantics, compact representations, and the use of principal component analysis (PCA) to find basis vectors. Additionally, it highlights the importance of word2vec as a versatile software package for word vector representation, encompassing models like Continuous Bag of Words (CBoW) and Skip-Gram with various training methods.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
CS6501: Vision and Language Word Embeddings
Today Distributional Semantics Word2Vec CS6501: Vision and Language
How to represent a word? Problem: distance between words using one-hot encodings always the same [1 0 0 0 0 0 0 0 0 0 ] dog 1 cat 2 [0 1 0 0 0 0 0 0 0 0 ] person 3 [0 0 1 0 0 0 0 0 0 0 ] Idea: Instead of one-hot-encoding use a histogram of commonly co-occurring words. What we see CS6501: Vision and Language
Distributional Semantics Dogs are man s best friend. I saw a dog on a leash walking in the park. His dog is his best companion. He walks his dog in the late afternoon walking sleeps friend walks leash food park runs legs sits [3 2 3 4 2 4 3 5 6 7 ] dog What we see CS6501: Vision and Language
Distributional Semantics [5 5 0 5 0 0 5 5 0 2 ] dog [5 4 1 4 2 0 3 4 0 3 ] cat [5 5 1 5 0 2 5 5 0 0 ] person invented window mouse mirror sleeps walks food runs legs tail This vocabulary can be extremely large What we see CS6501: Vision and Language
Toward more Compact Representations [5 5 0 5 0 0 5 5 0 2 ] dog [5 4 1 4 2 0 3 4 0 3 ] cat [5 5 1 5 0 2 5 5 0 0 ] person invented window mouse mirror sleeps walks food runs legs tail This vocabulary can be extremely large What we see CS6501: Vision and Language
Toward more Compact Representations 5 5 0 5 0 0 5 5 0 2 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 dog = w1 = + w2 + + w3 legs, running, walking tail, fur, ears mirror, window, door What we see CS6501: Vision and Language
Toward more Compact Representations dog = w1 w2 w3 The basis vectors can be found using Principal Component Analysis (PCA) What we see CS6501: Vision and Language
What is word2vec? word2vec is not a single algorithm It is a software package for representing words as vectors, containing: Two distinct models CBoW Skip-Gram Various training methods Negative Sampling Hierarchical Softmax A rich preprocessing pipeline Dynamic Context Windows Subsampling Deleting Rare Words (SG) (NS) Slide by Omer Levy
Embeddings capture relational meaning! vector( king ) - vector( man ) + vector( woman ) vector( queen ) vector( Paris ) - vector( France ) + vector( Italy ) vector( Rome ) Slide by Dan Jurafsky
Skip-Grams with Negative Sampling (SGNS) Marco saw a furry little wampimuk hiding in the tree. word2vec Explained Goldberg & Levy, arXiv 2014 11
Skip-Grams with Negative Sampling (SGNS) Marco saw a furry little wampimuk hiding in the tree. word2vec Explained Goldberg & Levy, arXiv 2014 12
Skip-Grams with Negative Sampling (SGNS) Marco saw a furry little wampimuk hiding in the tree. words wampimuk wampimuk wampimuk wampimuk contexts furry little hiding in ? (data) word2vec Explained Goldberg & Levy, arXiv 2014 13
Skip-Grams with Negative Sampling (SGNS) SGNS finds a vector ? for each word ? in our vocabulary ?? Each such vector has ? latent dimensions (e.g. ? = 100) Effectively, it learns a matrix ? whose rows represent ?? Key point: it also learns a similar auxiliary matrix ? of context vectors In fact, each word has two embeddings ? ?:wampimuk = ( 3.1,4.15,9.2, 6.5, ) ? ?? ? ? ?? ?:wampimuk = ( 5.6,2.95,1.4, 1.3, ) word2vec Explained Goldberg & Levy, arXiv 2014 14
Word2Vec Objective ? ? ? =1 ? ?=1 log?(??|?) ? ? ?,? 0 15
Word2Vec Objective ?? exp ?? ?=1 ? ???) = ? ??) exp(?? 16
Skip-Grams with Negative Sampling (SGNS) Maximize: ? ? ? ? was observed with ? words wampimuk wampimuk wampimuk wampimuk contexts furry little hiding in word2vec Explained Goldberg & Levy, arXiv 2014 17
Skip-Grams with Negative Sampling (SGNS) Maximize: ? ? ? ? was observed with ? Minimize: ? ? ? ? was hallucinated with ? words wampimuk wampimuk wampimuk wampimuk contexts furry little hiding in words wampimuk wampimuk wampimuk wampimuk contexts Australia cyber the 1985 word2vec Explained Goldberg & Levy, arXiv 18
Skip-Grams with Negative Sampling (SGNS) Negative Sampling SGNS samples ? contexts ? at random as negative examples Random = unigram distribution ? ? =#? ? Spoiler: Changing this distribution has a significant effect 19
Questions? CS6501: Vision and Language 20