Exploring Compositional and Interpretable Semantic Spaces in VSMs

Slide Note

This collection of images and descriptions dives into the realm of Vector Space Models (VSMs) and their composition, focusing on how to make a VSM, previous work in the field, matrix factorization, interpretability of latent dimensions, and utilizing SVD for interpretability. The research addresses learning VSMs aware of composition functions, interpretability, corpus statistics, and word interpretability in a structured and detailed manner.

jerylh Follow

Uploaded on Sep 11, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

A Compositional and Interpretable Semantic Space Alona Fyshe, Leila Wehbe, Partha Talukdar, Brian Murphy, and Tom Mitchell Carnegie Mellon University amfyshe@gmail.com 1

VSMs and Composition lettuce carrots apple pear orange 2

How to Make a VSM Few cols Many cols Corpus Statistics Dim. Count Reduction VSM 3

VSMs and Composition lettuce carrots apple seedless orange pear orange 4

VSMs and Composition adjective noun estimate f( , ) = Stats for seedless Stats for orange Observed stats for seedless orange observed 5

Previous Work What is f ? (Mitchell & Lapata, 2010; Baroni and Zamparelli, 2010; Blacoe and Lapata, 2012; Socher et al., 2012; Dinu et al., 2013; Hermann & Blunsom, 2013) Which VSMs are best for composition? (Turney, 2012, 2013; Fyshe et al., 2013; Baroni et al., 2014) 6

Our Contributions Can we learn a VSM that is aware of composition function? is interpretable? F F 7

How to make a VSM Corpus 16 billion words 50 million documents Count dependencies arcs in sentences MALT dependency parser Point-wise Positive Mutual Information 8

Matrix Factorization in VSMs Corpus Stats (c) D X A Words VSM 9

Interpretability Latent Dims A Words 10

Interpretability SVD (Fyshe 2013) well, long, if, year, watch plan, engine, e, rock, very get, no, features, music, via Word2vec (pretrained on Google News) pleasantries, draft_picks, chairman_Harley_Hotchkiss, windstorm, Vermont_Yankee Programme_Producers_AMPTPP, ###/mt, Al_Mehwar, NCWS, Whereas Ubiquitous_Sensor_Networks, KTO, discussing, Hibernia_Terra_Nova, NASDAQ_ENWV 11

Non-Negative Sparse Embeddings D X A (Murphy 2012) 12

Interpretability SVD well, long, if, year, watch plan, engine, e, rock, very get, no, features, music, via NNSE inhibitor, inhibitors, antagonists, receptors, inhibition bristol, thames, southampton, brighton, poole delhi, india, bombay, chennai, madras 13

A Composition-aware VSM 14

Modeling Composition Rows of X are words Can also be phrases Adjectives Adjectives Nouns Nouns X A Phrases Phrases 15

Modeling Composition Additional constraint for composition w1 w2 Nouns Adjectives A Phrases p p = [w1 w2] 16

Weighted Addition 17

Modeling Composition 18

Modeling Composition Reformulate loss with square matrix B B A -1 noun col. phrase col adj. col. 19

Modeling Composition 20

Optimization Online Dictionary Learning Algorithm (Mairal 2010) Solve for D with gradient descent Solve for A with ADMM Alternating Direction Method of Multipliers 21

Testing Composition W. add w1 w2 SVD p w1 w2 W. NNSE A p CNNSE w1 w2 A p 22

Phrase Estimation Predict phrase vector Sort test phrases by distance to estimate Rank (r/N*100) Reciprocal rank (1/r) Percent Perfect ( (r==1)) r N 23

Phrase Estimation Chance 50 1% ~ 0.05 24

Interpretable Dimensions 25

Interpretability 26

Testing Interpretability SVD w1 w2 SVD p w1 w2 NNSE A p CNNSE w1 w2 A p 27

Interpretability Select the word that does not belong: crunchy gooey fluffy crispy colt creamy 28

Interpretability 29

Phrase Representations top scoring words/phrases phrase top scoring dimension A 30

Phrase Representations Choose list of words/phrases most associated with target phrase digital computers aesthetic, American music, architectural style cellphones, laptops, monitors both neither 31

Phrase Representation 32

Testing Phrase Similarity 108 adjective-noun phrase pairs Human judgments of similarity [1 7] E.g. Important part : significant role (very similar) Northern region : early age (not similar) (Mitchell & Lapata 2010) 33

Correlation of Distances Model A Behavioral Data Model B 34

Testing Phrase Similarity 35

Interpretability 36

Better than Correlation: Interpretability (behav sim score 6.33/7) 37 http://www.cs.cmu.edu/~afyshe/thesis/cnnse_mitchell_lapata_all.html

Better than Correlation: Interpretability (behav sim score 5.61/7) 38 http://www.cs.cmu.edu/~afyshe/thesis/cnnse_mitchell_lapata_all.html

Summary Composition awareness improves VSMs Closer to behavioral measure of phrase similarity Better phrase representations Interpretable dimensions Helps to debug composition failures 39