Advancements in Word Embeddings through Dependency-Based Techniques
"Explore the evolution of word embeddings with a focus on dependency-based methods, showcasing innovations like Skip-Gram with Negative Sampling. Learn about Generalizing Skip-Gram and the shift towards analyzing linguistically rich embeddings using various contexts such as bag-of-words and syntactic dependencies."
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Dependency-Based Word Embeddings Omer Levy Yoav Goldberg Bar-Ilan University Israel
Neural Embeddings Dense vectors Each dimension is a latent feature word2vec (Mikolov et al., 2013) State-of-the-Art: Skip-Gram with Negative Sampling Linguistic Regularities king man + woman = queen Linguistic Regularities in Sparse and Explicit Word Representations Friday, 2:00 PM, CoNLL 2014
Our Main Contribution: Our Main Contribution: Generalizing Skip-Gram with Negative Sampling
Skip-Gram with Negative Sampling v2.0 Original implementation assumes bag-of-words contexts We generalize to arbitrary contexts Dependency contexts create qualitatively different word embeddings Provide a new tool for linguistically analyzing embeddings
Example Australian scientist discovers star with telescope
Target Word Australian scientist discovers star with telescope
Bag of Words (BoW) Context Australian scientist discovers star with telescope
Bag of Words (BoW) Context Australian scientist discovers star with telescope
Bag of Words (BoW) Context Australian scientist discovers star with telescope
Syntactic Dependency Context Australian scientist discovers star with telescope
Syntactic Dependency Context nsubj prep_with dobj Australian scientist discovers star with telescope
Syntactic Dependency Context nsubj prep_with dobj Australian scientist discovers star with telescope
How does Skip-Gram work? Skip-gram represents each word ? as a vector ? Skip-gram represents each context word ? as a different vector ? Same word has 2 different embeddings (as word , as context ) ?????????? ??????????
How does Skip-Gram work? Text ? 2,? 1,?,?+1,?+2 ?,? = ? 1 Bag of Words Context Word-Context Pairs Learning
How does Skip-Gram work? Text ? 2,? 1,?,?+1,?+2 ?,? = ? 1 Bag of Words Contexts Word-Context Pairs Learning
Our Modification Text ? Arbitrary Contexts ?,? =? Word-Context Pairs Learning
Our Modification Text ? Arbitrary Contexts ?,? =? Word-Context Pairs Learning Modified word2vec publicly available!
Our Modification: Example Text ?????,?,???? ?,? = ???? Syntactic Contexts Word-Context Pairs Learning
Our Modification: Example Text (Wikipedia) ?????,?,???? ?,? = ???? Syntactic Contexts Word-Context Pairs Learning
Our Modification: Example Text (Wikipedia) ?????,?,???? ?,? = ???? Syntactic Contexts (Stanford Dependencies) Word-Context Pairs Learning
What is the effect of different context types? Thoroughly studied in explicit representations (distributional) Lin (1998), Pad and Lapata (2007), and many others General Conclusion: Bag-of-words contexts induce topical similarities Dependency contexts induce functional similarities Share the same semantic type Cohyponyms Does this hold for embeddings as well?
Embedding Similarity with Different Contexts Target Word Bag of Words (k=5) Dumbledore hallows half-blood Malfoy Snape Related to Harry Potter Dependencies Sunnydale Collinwood Calarts Greendale Millfield Hogwarts (Harry Potter s school) Schools
Embedding Similarity with Different Contexts Target Word Bag of Words (k=5) nondeterministic non-deterministic computability deterministic finite-state Related to computability Dependencies Pauling Hotelling Heting Lessing Hamming Turing (computer scientist) Scientists
Embedding Similarity with Different Contexts Target Word Bag of Words (k=5) singing dance dances dancers tap-dancing Related to dance Dependencies singing rapping breakdancing miming busking dancing (dance gerund) Gerunds Online Demo!
Embedding Similarity with Different Contexts Dependency-based embeddings have more functional similarities This phenomenon goes beyond these examples Quantitative Analysis (in the paper)
Quantitative Analysis 1 0.9 Dependencies 0.8 BoW (k=2) Precision 0.7 BoW (k=5) 0.6 0.5 0.4 0.3 0 0.1 0.2 0.3 0.4 0.5 Recall 0.6 0.7 0.8 0.9 1 Dependency-based embeddings have more functional similarities
Dependency Contexts & Functional Similarity Thoroughly studied in explicit representations (distributional) Lin (1998), Pad and Lapata (2007), and many others In explicit representations, we can look at the features and analyze But embeddings are a black box! Dimensions are latent and don t necessarily have any meaning
Peeking into Skip-Grams Black Box Skip-Gram allows a peek Contexts are embedded in the same space! Given a word ?, find the contexts ?it activates most: argmax ? ? ?
Associated Contexts Target Word Dependencies students/prep_at-1 educated/prep_at-1 student/prep_at-1 stay/prep_at-1 learned/prep_at-1 Hogwarts
Associated Contexts Target Word Dependencies machine/nn-1 test/nn-1 theorem/poss-1 machines/nn-1 tests/nn-1 Turing
Associated Contexts Target Word Dependencies dancing/conj dancing/conj-1 singing/conj-1 singing/conj ballroom/nn dancing
Analyzing Embeddings We found a way to linguistically analyze embeddings Together with the ability to engineer contexts we now have the tools to create task-tailored embeddings!
Conclusion Generalized Skip-Gram with Negative Sampling to arbitrary contexts Different contexts induce different similarities Suggest a way to peek inside the black box of embeddings Code, demo, and word vectors available from our websites Make linguistically-motivatedtask-tailored embeddings today! Thank you for listening :)