Unveiling Polarity with Polarity-Inducing Latent Semantic Analysis

Slide Note

Polarity-Inducing Latent Semantic Analysis (PILSA) introduces a novel vector space model that distinguishes antonyms from synonyms. By encoding polarity information, synonyms cluster closely while antonyms are positioned at opposite ends of a unit sphere. Existing models struggle with finer distinctions such as antonymy, which PILSA addresses effectively. Various applications, from document clustering to word similarity assessment, benefit from such nuanced semantic analysis.

jwans Follow

Uploaded on Sep 14, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Polarity Inducing Latent Semantic Analysis A vector space model that can distinguish Antonyms from Synonyms! Scott Wen-tau Yih Joint work with Geoffrey Zweig & John Platt Microsoft Research

Vector Space Model Text objects (e.g., words, phrases, sentences or documents) are represented as vectors High-dimensional sparse term-vectors Concept vectors from topic models or projection methods Constructed compositionally from word vectors [Socher et al. 12] Relations of the text objects are estimated by functions in the vector space Relatedness is measured by some distance function (e.g., cosine) vq cos( ) vd v q

Applications of Vector Space Models Document Level Information Retrieval [Salton & McGill 83] Document Clustering [Deerwester et al. 90] Search Relevance Measurement [Baeza-Yates & Riberio-Neto 99] Cross-lingual document retrieval [Platt et al. 10; Yih et al. 11] Word Level Language modeling [Bellegarda 00] Word similarity and relatedness [Deerwester et al. 90; Lin 98; Turney 01; Turney & Littman 05; Agirre et al. 09; Reisinger & Mooney 10; Yih & Qazvinian 12]

Beyond General Similarity Existing VSMs cannot distinguish finer relations The antonym issue of distributional similarity The co-occurrence or distributional hypotheses Apply to near-synonyms, hypernyms and other semantically related words, including antonyms[Mohammad et al. 08] e.g., hot and cold occur in similar contexts LSA does not solve the issue Might assign a high degree of similarity to opposites as well as synonyms [Landauer & Laham 98]

Approaches for Detecting Antonyms Separate antonyms from distributionally similar word pairs [Lin et al. 03] Patterns: from X to Y , either X or Y WordNet graph [Harabagiu et al. 06] Synsets connected by is-a links and exactly one antonymy link WordNet + affix rules + heuristics [Mohammad et al. 08] Distinguishing synonyms and antonyms is still perceived as a difficult open problem [Poon & Domingos 09]

Our Contributions Polarity Inducing Latent Semantic Analysis (PILSA) A vector space model that encodes polarity information Synonyms cluster together in this space Antonyms lie at the opposite ends of a unit sphere burning hot freezing cold

Our Contributions Polarity Inducing Latent Semantic Analysis (PILSA) A vector space model that encodes polarity information Synonyms cluster together in this space Antonyms lie at the opposite ends of a unit sphere Significantly improved the prediction accuracy on a benchmark GRE dataset (64% 80%)

Roadmap Introduction Polarity Inducing Latent Semantic Analysis Basic construction Extension 1: Improving accuracy Extension 2: Improving coverage Experimental evaluation Task & datasets Results Conclusion

The Core Method Input: A thesaurus (with synonyms & antonyms) Create a document -term matrix Each group of words (synonyms and antonyms) is treated as a document Induce polarity by making antonyms have negative weights Apply SVD as in regular Latent Semantic Analysis

Matrix Construction Acrimony: rancor, conflict, bitterness; goodwill, affection Affection: goodwill, tenderness, fondness; acrimony, rancor Document: row-vector Term: column-vector acrimony rancor goodwill affection Group 1: acrimony 4.73 6.01 5.81 4.86 Group 2: affection 3.78 5.23 6.21 5.15 TFIDF score

Matrix Construction Acrimony: rancor, conflict, bitterness; goodwill, affection Affection: goodwill, tenderness, fondness; acrimony, rancor Inducing polarity acrimony rancor goodwill affection Group 1: acrimony 4.73 6.01 -5.81 -4.86 Group 2: affection -3.78 -5.23 6.21 5.15 Cosine Score: + ???????? ????????

Effect of Inducing Polarity acrimony rancor goodwill affection Group 1: acrimony 4.73 6.01 5.81 4.86 Group 2: affection 3.78 5.23 6.21 5.15

Effect of Inducing Polarity acrimony rancor goodwill affection Group 1: acrimony 1 1 1 1 Group 2: affection 1 1 1 1 Cosine similarity = 1

Effect of Inducing Polarity acrimony rancor goodwill affection Group 1: acrimony 1 1 1 1 Group 2: affection 1 1 1 1 Cosine similarity = 1 Cannot distinguish antonyms from synonyms!

Effect of Inducing Polarity acrimony rancor goodwill affection Group 1: acrimony 1 1 1 1 Group 2: affection 1 1 1 1 acrimony rancor goodwill affection Group 1: acrimony 1 1 -1 -1 Group 2: affection -1 -1 1 1 Cosine similarity = 1

Effect of Inducing Polarity acrimony rancor goodwill affection Group 1: acrimony 1 1 1 1 Group 2: affection 1 1 1 1 acrimony rancor goodwill affection Group 1: acrimony 1 1 -1 -1 Group 2: affection -1 -1 1 1 Cosine similarity = -1

Mapping to Latent Space via SVD words ? ?T ? ? ? ? ? ? ? ? ? ? Word similarity: cosine of two columns in ??T SVD generalizes and smooths the original data Uncovers relationships not explicit in the thesaurus

Mapping to Latent Space via SVD words ? ?T ? ? ? ? ? ? ? ? ? ? As ?T? = ??T, ?? ? can be viewed as the projection matrix that maps the raw ? 1 column-vector to the ?-dimensional latent space

Extension 1: Improve Accuracy Refine the projection matrix by discriminative training S2Net [Yih et al. 11]: very similar to RankNet [Burges et al. 05] but focuses on learning concept vectors ????(??,??) ?? ?? ?1 ?? ??= ???? ?? ? ?? ?1 ??

Applying S2Net Training data: Antonym pairs from thesaurus Initialize model with the PILSA projection matrix Learning objective: cosine score of antonyms should be lower than other word pairs ?? cos ?T???,?T??? cos ?T???,?T??? Other word pair Antonyms 20 ? ??;? = log(1 + exp( ? ??)) 10 0 -2 -1 0 1 2

Extension 2: Improve Coverage What to do with out-of-thesaurus words? Some lexical variations Encarta thesaurus contains corruptible and corruption , but not corruptibility Morphological analysis and stemming to find alternatives of an out-of-thesaurus target word Rare or offensive words e.g., froward and moronic Embedding out-of-thesaurus words by leveraging a general corpus

Embedding Out-of-thesaurus Words Create a context vector space model using a collection of documents (e.g., Wikipedia) Context: words within a window of [-10,10] Embed target word into the PILSA space by ?-NN Find nearby in-thesaurus words in the context space Remove words with inconsistent polarity Use the centroid of the corresponding PILSA vectors to represent the target word

Embedding Out-of-thesaurus Words Create a context vector space model using a collection of documents (e.g., Wikipedia) Context: words within a window of [-10,10] Embed target word into the PILSA space by ?-NN hot sweltering burning cold Context Vector Space PILSA Space

Roadmap Introduction Polarity Inducing Latent Semantic Analysis Basic construction Extension 1: Improving accuracy Extension 2: Improving coverage Experimental evaluation Task & datasets Results Conclusion

Data for Building PILSA Models Encarta Thesaurus (for basic PILSA) 47k word categories (i.e., the documents ) Vocabulary of 50k words 125,724 pairs of antonyms Wikipedia (for embedding out-of-thesaurus words) Sentences from a Nov-2010 snapshot 917M words after preprocessing

Experimental Evaluation Task: GRE closest-opposite questions Which is the closest opposite of adulterate? (a) renounce (b) forbid (c) purify (d) criticize (e) correct Dev / Test: 162 / 950 questions [Mohammad et al. 08] Dev set is used for tuning the dimensionality of PILSA Evaluation metric Accuracy: #correct / #total questions Questions with unresolved out-of-thesaurus target words are treated answered incorrectly

Results on Test Set 0.85 0.8 0.8 0.77 0.74 0.75 0.7 0.64 0.65 0.6 0.57 0.56 0.55 0.5 Lookup Raw TFIDF PILSA PILSA+S2Net OOV Embedding Mohammad et al. 08

Examples Target word: admirable No polarity LSA Most Similar: commendable, creditable, despicable Least Similar: uninviting, dessert, seductive With polarity PILSA Most Similar: commendable, creditable, laudable Least Similar: despicable, shameful, unworthy Full results on GRE test set are available online

Conclusion Polarity Inducing LSA Solves the open problem of antonyms/synonyms by making a vector space that can distinguish opposites Vector space designed so that synonyms/antonyms tend to have positive/negative cosine similarity Future Work New methods or representations for other word relations e.g., Part-Whole, Is-A, Attribute Applications e.g., Textual Entailment or Sentence Completion

Unveiling Polarity with Polarity-Inducing Latent Semantic Analysis

Download Presentation

Presentation Transcript

Related

More Related Content