Compositional and Interpretable Semantic Spaces in VSMs

 
A Compositional and
Interpretable Semantic Space
 
Alona Fyshe, Leila Wehbe, Partha Talukdar, Brian
Murphy, and Tom Mitchell
 
Carnegie Mellon University
 
amfyshe@gmail.com
 
1
 
2
 
pear
 
lettuce
 
orange
 
apple
 
carrots
 
VSMs and Composition
 
How to Make a VSM
Count
Dim.
Reduction
Corpus
Statistics
 
VSM
 
3
 
Many cols
 
Few cols
4
pear
lettuce
orange
apple
carrots
 
seedless orange
VSMs and Composition
VSMs and Composition
 
 
f
(                       ,                        )
 
=
adjective
noun
 
estimate
 
observed
5
Stats for seedless
Stats for orange
Observed stats for “seedless orange”
Previous Work
 
What is “f”?
(Mitchell & Lapata, 2010; Baroni and Zamparelli, 2010; Blacoe
and Lapata, 2012; 
Socher et al., 2012; 
Dinu et al., 2013;
Hermann & Blunsom, 2013
)
Which VSMs are best for composition?
(Turney, 2012, 2013; Fyshe et al., 2013; Baroni et al., 2014)
6
Our Contributions
 
Can we learn a VSM that
is aware of composition function?
is interpretable?
 
Is edible
7
 
How to make a VSM
 
Corpus
16 billion words
50 million documents
Count dependencies arcs in sentences
MALT dependency parser
Point-wise Positive Mutual Information
 
 
 
8
Matrix Factorization in VSMs
A
D
 
 
Corpus Stats (c)
 
Words
9
 
VSM
Interpretability
10
 
Latent Dims
 
Words
Interpretability
11
 
SVD (Fyshe 2013)
well, long, if, year, watch
plan, engine, e, rock, very
get, no, features, music, via
Word2vec (pretrained on Google News)
 pleasantries, draft_picks, chairman_Harley_Hotchkiss,
windstorm, Vermont_Yankee
Programme_Producers_AMPTPP, ###/mt, Al_Mehwar, NCWS,
Whereas
Ubiquitous_Sensor_Networks, KTO, discussing,
Hibernia_Terra_Nova, NASDAQ_ENWV
Non-Negative Sparse Embeddings
12
(Murphy 2012)
Interpretability
13
 
SVD
well, long, if, year, watch
plan, engine, e, rock, very
get, no, features, music, via
NNSE
inhibitor, inhibitors, antagonists, receptors,
inhibition
bristol, thames, southampton, brighton, poole
delhi, india, bombay, chennai, madras
 
A Composition-aware VSM
 
14
Modeling Composition
Rows of X are words
Can also be phrases
A
Phrases
 
Phrases
15
Adjectives
Nouns
 
Adjectives
 
Nouns
Modeling Composition
Additional constraint for composition
A
Phrases
Adjectives
 
w
1
 
w
2
 
p
 
p = [w
1
 w
2
]
16
Nouns
 
Weighted Addition
 
17
 
Modeling Composition
 
18
Modeling Composition
Reformulate loss with square matrix B
19
A
B
α
β
-1
 
adj. col.
 
noun col.
 
phrase col
 
Modeling Composition
 
20
 
Optimization
 
Online Dictionary Learning Algorithm
(Mairal 2010)
Solve for D with gradient descent
Solve for A with ADMM
Alternating Direction Method of Multipliers
 
21
Testing Composition
 
W. add
 
 
W. NNSE
 
CNNSE
22
Phrase Estimation
Predict phrase vector
Sort test phrases by distance to estimate
 
Rank (r/N*100)
Reciprocal rank (1/r)
Percent Perfect (δ(r==1))
 
r
23
 
N
 
Phrase Estimation
 
Chance 50
 
~ 0.05
 
1%
 
24
 
Interpretable Dimensions
 
25
 
Interpretability
 
26
 
Testing Interpretability
 
SVD
 
 
NNSE
 
CNNSE
 
27
Interpretability
 
Select the word that does not belong:
crunchy
gooey
fluffy
crispy
colt
creamy
28
Interpretability
29
Phrase Representations
30
 
phrase
 
top scoring
words/phrases
 
top scoring
dimension
Phrase Representations
 
Choose list of words/phrases most associated
with target phrase “
digital computers
aesthetic, American music, architectural style
cellphones, laptops, monitors
both
neither
31
 
Phrase Representation
 
32
Testing Phrase Similarity
 
108 adjective-noun phrase pairs
Human judgments of similarity [1…7]
E.g.
Important part : significant role 
(very similar)
Northern region : early age 
(not similar)
33
(Mitchell & Lapata 2010) 
Correlation of Distances
34
Behavioral Data
 
Model A
 
Model B
 
Testing Phrase Similarity
 
35
 
Interpretability
 
36
Better than Correlation: Interpretability
37
http://www.cs.cmu.edu/~afyshe/thesis/cnnse_mitchell_lapata_all.html
(behav sim score 6.33/7)
Better than Correlation: Interpretability
38
http://www.cs.cmu.edu/~afyshe/thesis/cnnse_mitchell_lapata_all.html
(behav sim score 5.61/7)
 
Summary
 
Composition awareness improves VSMs
Closer to behavioral measure of phrase similarity
Better phrase representations
Interpretable dimensions
Helps to debug composition failures
 
39
 
Thanks!
 
www.cs.cmu.edu/~fmri/papers/naacl2015/
 
amfyshe@gmail.com
 
40
Slide Note
Embed
Share

This collection of images and descriptions dives into the realm of Vector Space Models (VSMs) and their composition, focusing on how to make a VSM, previous work in the field, matrix factorization, interpretability of latent dimensions, and utilizing SVD for interpretability. The research addresses learning VSMs aware of composition functions, interpretability, corpus statistics, and word interpretability in a structured and detailed manner.

  • Semantic Spaces
  • VSMs
  • Composition
  • Interpretability
  • Matrix Factorization

Uploaded on Sep 11, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. A Compositional and Interpretable Semantic Space Alona Fyshe, Leila Wehbe, Partha Talukdar, Brian Murphy, and Tom Mitchell Carnegie Mellon University amfyshe@gmail.com 1

  2. VSMs and Composition lettuce carrots apple pear orange 2

  3. How to Make a VSM Few cols Many cols Corpus Statistics Dim. Count Reduction VSM 3

  4. VSMs and Composition lettuce carrots apple seedless orange pear orange 4

  5. VSMs and Composition adjective noun estimate f( , ) = Stats for seedless Stats for orange Observed stats for seedless orange observed 5

  6. Previous Work What is f ? (Mitchell & Lapata, 2010; Baroni and Zamparelli, 2010; Blacoe and Lapata, 2012; Socher et al., 2012; Dinu et al., 2013; Hermann & Blunsom, 2013) Which VSMs are best for composition? (Turney, 2012, 2013; Fyshe et al., 2013; Baroni et al., 2014) 6

  7. Our Contributions Can we learn a VSM that is aware of composition function? is interpretable? F F 7

  8. How to make a VSM Corpus 16 billion words 50 million documents Count dependencies arcs in sentences MALT dependency parser Point-wise Positive Mutual Information 8

  9. Matrix Factorization in VSMs Corpus Stats (c) D X A Words VSM 9

  10. Interpretability Latent Dims A Words 10

  11. Interpretability SVD (Fyshe 2013) well, long, if, year, watch plan, engine, e, rock, very get, no, features, music, via Word2vec (pretrained on Google News) pleasantries, draft_picks, chairman_Harley_Hotchkiss, windstorm, Vermont_Yankee Programme_Producers_AMPTPP, ###/mt, Al_Mehwar, NCWS, Whereas Ubiquitous_Sensor_Networks, KTO, discussing, Hibernia_Terra_Nova, NASDAQ_ENWV 11

  12. Non-Negative Sparse Embeddings D X A (Murphy 2012) 12

  13. Interpretability SVD well, long, if, year, watch plan, engine, e, rock, very get, no, features, music, via NNSE inhibitor, inhibitors, antagonists, receptors, inhibition bristol, thames, southampton, brighton, poole delhi, india, bombay, chennai, madras 13

  14. A Composition-aware VSM 14

  15. Modeling Composition Rows of X are words Can also be phrases Adjectives Adjectives Nouns Nouns X A Phrases Phrases 15

  16. Modeling Composition Additional constraint for composition w1 w2 Nouns Adjectives A Phrases p p = [w1 w2] 16

  17. Weighted Addition 17

  18. Modeling Composition 18

  19. Modeling Composition Reformulate loss with square matrix B B A -1 noun col. phrase col adj. col. 19

  20. Modeling Composition 20

  21. Optimization Online Dictionary Learning Algorithm (Mairal 2010) Solve for D with gradient descent Solve for A with ADMM Alternating Direction Method of Multipliers 21

  22. Testing Composition W. add w1 w2 SVD p w1 w2 W. NNSE A p CNNSE w1 w2 A p 22

  23. Phrase Estimation Predict phrase vector Sort test phrases by distance to estimate Rank (r/N*100) Reciprocal rank (1/r) Percent Perfect ( (r==1)) r N 23

  24. Phrase Estimation Chance 50 1% ~ 0.05 24

  25. Interpretable Dimensions 25

  26. Interpretability 26

  27. Testing Interpretability SVD w1 w2 SVD p w1 w2 NNSE A p CNNSE w1 w2 A p 27

  28. Interpretability Select the word that does not belong: crunchy gooey fluffy crispy colt creamy 28

  29. Interpretability 29

  30. Phrase Representations top scoring words/phrases phrase top scoring dimension A 30

  31. Phrase Representations Choose list of words/phrases most associated with target phrase digital computers aesthetic, American music, architectural style cellphones, laptops, monitors both neither 31

  32. Phrase Representation 32

  33. Testing Phrase Similarity 108 adjective-noun phrase pairs Human judgments of similarity [1 7] E.g. Important part : significant role (very similar) Northern region : early age (not similar) (Mitchell & Lapata 2010) 33

  34. Correlation of Distances Model A Behavioral Data Model B 34

  35. Testing Phrase Similarity 35

  36. Interpretability 36

  37. Better than Correlation: Interpretability (behav sim score 6.33/7) 37 http://www.cs.cmu.edu/~afyshe/thesis/cnnse_mitchell_lapata_all.html

  38. Better than Correlation: Interpretability (behav sim score 5.61/7) 38 http://www.cs.cmu.edu/~afyshe/thesis/cnnse_mitchell_lapata_all.html

  39. Summary Composition awareness improves VSMs Closer to behavioral measure of phrase similarity Better phrase representations Interpretable dimensions Helps to debug composition failures 39

  40. Thanks! www.cs.cmu.edu/~fmri/papers/naacl2015/ amfyshe@gmail.com 40

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#