Key Insights into Neural Embeddings and Word Representations

Slide Note
Embed
Share

Explore the comparison between neural embeddings and explicit word representations, uncovering the mystery behind vector arithmetic in revealing analogies. Delve into the impact of sparse and dense vectors in representing words, with a focus on linguistic regularities and geometric patterns in neural embeddings.


Uploaded on Dec 05, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Linguistic Regularities in Sparse and Explicit Word Representations Omer Levy Yoav Goldberg Bar-Ilan University Israel

  2. Papers in ACL 2014* Other Topics Neural Networks & Word Embeddings * Sampling error: +/- 100%

  3. Neural Embeddings Dense vectors Each dimension is a latent feature Common software package: word2vec ?????: ( 7.35,9.42,0.88, ) 100 Magic king man + woman = queen (analogies)

  4. Representing words as vectors is not new!

  5. Explicit Representations (Distributional) Sparse vectors Each dimension is an explicit context Common association metric: PMI, PPMI ?????: ????:17,?????:5,????:2, ????? 100,000 Does the same magic work for explicit representations too? Baroni et al. (2014) showed that embeddings outperform explicit, but

  6. Questions Are analogies unique to neural embeddings? Compare neural embeddings with explicit representations Why does vector arithmetic reveal analogies? Unravel the mystery behind neural embeddings and their magic

  7. Background

  8. Mikolov et al. (2013a,b,c) Neural embeddings have interesting geometries

  9. Mikolov et al. (2013a,b,c) Neural embeddings have interesting geometries These patterns capture relational similarities Can be used to solve analogies: man is to woman as king is to queen

  10. Mikolov et al. (2013a,b,c) Neural embeddings have interesting geometries These patterns capture relational similarities Can be used to solve analogies: ? is to ? as ? is to ? Can be recovered by simple vector arithmetic: ? ? = ? ?

  11. Mikolov et al. (2013a,b,c) Neural embeddings have interesting geometries These patterns capture relational similarities Can be used to solve analogies: ? is to ? as ? is to ? With simple vector arithmetic: ? ? = ? ?

  12. Mikolov et al. (2013a,b,c) ? ? = ? ?

  13. Mikolov et al. (2013a,b,c) ? ? + ? = ?

  14. Mikolov et al. (2013a,b,c) ? ? ? ? king man + woman = queen

  15. Mikolov et al. (2013a,b,c) ? ? ? ? Tokyo Japan + France = Paris

  16. Mikolov et al. (2013a,b,c) ? ? ? ? best good + strong = strongest

  17. Mikolov et al. (2013a,b,c) ? ? ? ? best good + strong = strongest vectors in ?

  18. Are analogies unique to neural embeddings?

  19. Are analogies unique to neural embeddings? Experiment: compare embeddings to explicit representations

  20. Are analogies unique to neural embeddings? Experiment: compare embeddings to explicit representations

  21. Are analogies unique to neural embeddings? Experiment: compare embeddings to explicit representations Learn different representations from the samecorpus:

  22. Are analogies unique to neural embeddings? Experiment: compare embeddings to explicit representations Learn different representations from the samecorpus: Evaluate with the samerecovery method: argmax cos ? ,? ? + ? ?

  23. Analogy Datasets ? is to ? as ? is to ? 4 words per analogy: ? is to ? as ? is to ? Given 3 words: Guess the best suiting ? from the entire vocabulary ? Excluding the question words ?,? ,? MSR: Google: ~8000 syntactic analogies ~19,000 syntactic and semantic analogies

  24. Embedding vs Explicit (Round 1)

  25. Embedding vs Explicit (Round 1) 70% 60% 50% Accuracy 40% Embedding 63% Embedding 54% 30% Explicit 45% 20% Explicit 29% 10% 0% MSR Google Many analogies recovered by explicit, but many more by embedding.

  26. Why does vector arithmetic reveal analogies?

  27. Why does vector arithmetic reveal analogies? We wish to find the closest ? to ? ? + ? This is done with cosine similarity: ? ?cos ? ,? ? + ? argmax = ? ?cos ? ,? cos ? ,? + cos ? ,? argmax Problem: one similarity might dominate the rest.

  28. Why does vector arithmetic reveal analogies? We wish to find the closest ? to ? ? + ?

  29. Why does vector arithmetic reveal analogies? We wish to find the closest ? to ? ? + ? This is done with cosine similarity: cos ? ,? ? + ? argmax = ? ? ?cos ? ,? cos ? ,? + cos ? ,? argmax

  30. Why does vector arithmetic reveal analogies? We wish to find the closest ? to ? ? + ? This is done with cosine similarity: cos ? ,? ? + ? argmax = ? cos ? ,? cos ? ,? + cos ? ,? argmax ?

  31. Why does vector arithmetic reveal analogies? We wish to find the closest ? to ? ? + ? This is done with cosine similarity: cos ? ,? ? + ? argmax = ? cos ? ,? cos ? ,? + cos ? ,? argmax ? vector arithmetic = similarity arithmetic

  32. Why does vector arithmetic reveal analogies? We wish to find the closest ? to ? ? + ? This is done with cosine similarity: cos ? ,? ? + ? argmax = ? cos ? ,? cos ? ,? + cos ? ,? argmax ? vector arithmetic = similarity arithmetic

  33. Why does vector arithmetic reveal analogies? We wish to find the closest ? to ???? ??? + ????? This is done with cosine similarity: argmax cos ?,???? ??? + ????? = ? argmax cos ?,???? cos ?,??? + cos ?,????? ? vector arithmetic = similarity arithmetic

  34. Why does vector arithmetic reveal analogies? We wish to find the closest ? to ???? ??? + ????? This is done with cosine similarity: argmax cos ?,???? ??? + ????? = ? argmax cos ?,???? cos ?,??? + cos ?,????? royal? ? female? vector arithmetic = similarity arithmetic

  35. What does each similarity term mean? Observe the joint features with explicit representations! ????? ???? uncrowned majesty second ????? ????? Elizabeth Katherine impregnate

  36. Can we do better?

  37. Lets look at some mistakes

  38. Lets look at some mistakes England London + Baghdad = ?

  39. Lets look at some mistakes England London + Baghdad = Iraq

  40. Lets look at some mistakes England London + Baghdad = Mosul?

  41. The Additive Objective cos ????,??????? cos ????,?????? + cos ????,??? ??? 0.150.130.63= 0.65 0.13 cos ?????,??????? cos ?????,?????? + cos ?????,??? ??? 0.14 0.75= 0.74

  42. The Additive Objective cos ????,??????? cos ????,?????? + cos ????,??? ??? 0.150.130.63= 0.65 0.130.140.75= 0.74 cos ?????,??????? cos ?????,?????? + cos ?????,??? ???

  43. The Additive Objective cos ????,??????? cos ????,?????? + cos ????,??? ??? 0.150.130.63= 0.65 0.13 cos ?????,??????? cos ?????,?????? + cos ?????,??? ??? 0.140.75= 0.74

  44. The Additive Objective cos ????,??????? cos ????,?????? + cos ????,??? ??? 0.150.130.63= 0.65 0.13 cos ?????,??????? cos ?????,?????? + cos ?????,??? ??? 0.14 0.75= 0.74

  45. The Additive Objective cos ????,??????? cos ????,?????? + cos ????,??? ??? 0.150.130.63= 0.65 0.13 cos ?????,??????? cos ?????,?????? + cos ?????,??? ??? 0.14 0.75= 0.74

  46. The Additive Objective cos ????,??????? cos ????,?????? + cos ????,??? ??? 0.150.130.63= 0.65 0.13 cos ?????,??????? cos ?????,?????? + cos ?????,??? ??? 0.14 0.75= 0.74 Problem: one similarity might dominate the rest Much more prevalent in explicit representation Might explain why explicit underperformed

  47. How can we do better?

  48. How can we do better? Instead of adding similarities, multiply them!

  49. How can we do better? Instead of adding similarities, multiply them! cos ? ,? cos ? ,? cos ? ,? argmax ?

Related


More Related Content