Exploring Word Embeddings in Vision and Language: A Comprehensive Overview

Slide Note
Embed
Share

Word embeddings play a crucial role in representing words as compact vectors. This comprehensive overview delves into the concept of word embeddings, discussing approaches like one-hot encoding, histograms of co-occurring words, and more advanced techniques like word2vec. The exploration covers topics such as distributional semantics, compact representations, and the use of principal component analysis (PCA) to find basis vectors. Additionally, it highlights the importance of word2vec as a versatile software package for word vector representation, encompassing models like Continuous Bag of Words (CBoW) and Skip-Gram with various training methods.


Uploaded on Sep 17, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. CS6501: Vision and Language Word Embeddings

  2. Today Distributional Semantics Word2Vec CS6501: Vision and Language

  3. How to represent a word? Problem: distance between words using one-hot encodings always the same [1 0 0 0 0 0 0 0 0 0 ] dog 1 cat 2 [0 1 0 0 0 0 0 0 0 0 ] person 3 [0 0 1 0 0 0 0 0 0 0 ] Idea: Instead of one-hot-encoding use a histogram of commonly co-occurring words. What we see CS6501: Vision and Language

  4. Distributional Semantics Dogs are man s best friend. I saw a dog on a leash walking in the park. His dog is his best companion. He walks his dog in the late afternoon walking sleeps friend walks leash food park runs legs sits [3 2 3 4 2 4 3 5 6 7 ] dog What we see CS6501: Vision and Language

  5. Distributional Semantics [5 5 0 5 0 0 5 5 0 2 ] dog [5 4 1 4 2 0 3 4 0 3 ] cat [5 5 1 5 0 2 5 5 0 0 ] person invented window mouse mirror sleeps walks food runs legs tail This vocabulary can be extremely large What we see CS6501: Vision and Language

  6. Toward more Compact Representations [5 5 0 5 0 0 5 5 0 2 ] dog [5 4 1 4 2 0 3 4 0 3 ] cat [5 5 1 5 0 2 5 5 0 0 ] person invented window mouse mirror sleeps walks food runs legs tail This vocabulary can be extremely large What we see CS6501: Vision and Language

  7. Toward more Compact Representations 5 5 0 5 0 0 5 5 0 2 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 dog = w1 = + w2 + + w3 legs, running, walking tail, fur, ears mirror, window, door What we see CS6501: Vision and Language

  8. Toward more Compact Representations dog = w1 w2 w3 The basis vectors can be found using Principal Component Analysis (PCA) What we see CS6501: Vision and Language

  9. What is word2vec? word2vec is not a single algorithm It is a software package for representing words as vectors, containing: Two distinct models CBoW Skip-Gram Various training methods Negative Sampling Hierarchical Softmax A rich preprocessing pipeline Dynamic Context Windows Subsampling Deleting Rare Words (SG) (NS) Slide by Omer Levy

  10. Embeddings capture relational meaning! vector( king ) - vector( man ) + vector( woman ) vector( queen ) vector( Paris ) - vector( France ) + vector( Italy ) vector( Rome ) Slide by Dan Jurafsky

  11. Skip-Grams with Negative Sampling (SGNS) Marco saw a furry little wampimuk hiding in the tree. word2vec Explained Goldberg & Levy, arXiv 2014 11

  12. Skip-Grams with Negative Sampling (SGNS) Marco saw a furry little wampimuk hiding in the tree. word2vec Explained Goldberg & Levy, arXiv 2014 12

  13. Skip-Grams with Negative Sampling (SGNS) Marco saw a furry little wampimuk hiding in the tree. words wampimuk wampimuk wampimuk wampimuk contexts furry little hiding in ? (data) word2vec Explained Goldberg & Levy, arXiv 2014 13

  14. Skip-Grams with Negative Sampling (SGNS) SGNS finds a vector ? for each word ? in our vocabulary ?? Each such vector has ? latent dimensions (e.g. ? = 100) Effectively, it learns a matrix ? whose rows represent ?? Key point: it also learns a similar auxiliary matrix ? of context vectors In fact, each word has two embeddings ? ?:wampimuk = ( 3.1,4.15,9.2, 6.5, ) ? ?? ? ? ?? ?:wampimuk = ( 5.6,2.95,1.4, 1.3, ) word2vec Explained Goldberg & Levy, arXiv 2014 14

  15. Word2Vec Objective ? ? ? =1 ? ?=1 log?(??|?) ? ? ?,? 0 15

  16. Word2Vec Objective ?? exp ?? ?=1 ? ???) = ? ??) exp(?? 16

  17. Skip-Grams with Negative Sampling (SGNS) Maximize: ? ? ? ? was observed with ? words wampimuk wampimuk wampimuk wampimuk contexts furry little hiding in word2vec Explained Goldberg & Levy, arXiv 2014 17

  18. Skip-Grams with Negative Sampling (SGNS) Maximize: ? ? ? ? was observed with ? Minimize: ? ? ? ? was hallucinated with ? words wampimuk wampimuk wampimuk wampimuk contexts furry little hiding in words wampimuk wampimuk wampimuk wampimuk contexts Australia cyber the 1985 word2vec Explained Goldberg & Levy, arXiv 18

  19. Skip-Grams with Negative Sampling (SGNS) Negative Sampling SGNS samples ? contexts ? at random as negative examples Random = unigram distribution ? ? =#? ? Spoiler: Changing this distribution has a significant effect 19

  20. Questions? CS6501: Vision and Language 20

Related


More Related Content