Exploring Word Embeddings and Syntax Encoding

 
 
How much do word embeddings
encode about syntax?
 
Jacob Andreas and Dan Klein
UC Berkeley
 
    Everybody loves word embeddings
few
most
that
the
a
each
this
every
[Collobert 2011]
[Collobert 2011, Mikolov 2013, Freitag 2004, Schuetze 1995, Turian 2010]
 
  What might embeddings bring?
Cathleen complained about the magazine’s shoddy editorial quality .
 
 
 
Mary
 
executive
 
average
 
Today’s question
 
Can word embeddings
trained with surface context
improve a state-of-the-art
constituency parser?
 
(no)
 
 
 
Embeddings and parsing
 
Pre-trained word embeddings are useful for a
variety of NLP tasks
Can they improve a constituency parser?
(not very much)
[Cite XX, Cite XX, Cite XX]
 
Three hypotheses
Vocabulary expansion
(good for OOV words)
Statistic pooling
(good for medium-frequency words)
Embedding structure
(good for features)
 
 
Vocabulary expansion:
Embeddings help handling of
out-of-vocabulary words
 
 
Vocabulary expansion
 
John
 
Mary
 
Pierre
 
yellow
 
enormous
 
hungry
 
Cathleen
 
Vocabulary expansion
John
Mary
Pierre
yellow
enormous
hungry
Cathleen 
complained about the magazine’s shoddy editorial quality
.
Cathleen
 
Mary
 
Vocab. expansion results
Baseline
+OOV
 
 
Vocab. expansion results
 
Baseline
 
+OOV
 
 
Vocab. expansion results
 
Baseline
 
+OOV
 
(300 sentences)
 
 
Statistic pooling hypothesis:
Embeddings help handling of
medium-frequency words
 
 
Statistic pooling
 
executive
 
kind
 
giant
 
editorial
 
average
 
{NN, JJ}
 
{NN}
 
{NN, JJ}
 
{JJ}
 
{NN}
 
 
Statistic pooling
 
executive
 
kind
 
giant
 
editorial
 
average
 
{NN, JJ}
 
{NN, 
JJ
}
 
{NN, JJ}
 
{JJ, 
NN
}
 
{NN, 
JJ
}
 
 
Statistic pooling
 
executive
 
kind
 
giant
 
editorial
 
average
 
{NN, JJ}
 
{NN}
 
{NN, JJ}
 
{JJ}
 
{NN}
 
editorial
 
NN
 
editorial
 
NN
 
Statistic pooling results
Baseline
+Pooling
 
 
Vocab. expansion results
 
Baseline
 
+Pooling
 
(300 sentences)
 
 
Embedding structure hypothesis:
The organization of the embedding space
directly encodes useful features
 
 
Embedding structure
 
vanished
 
dined
 
vanishing
 
dining
 
devoured
 
assassinated
 
devouring
 
assassinating
 
“transitivity”
 
“tense”
 
VBD
 
[Huang 2011]
 
Embedding structure
 
vanished
dined
 
vanishing
 
dining
 
devoured
 
assassinated
 
devouring
 
assassinating
 
“transitivity”
 
“tense”
[Huang XX]
 
    Embedding structure results
Baseline
+Features
 
 
    Embedding structure results
 
Baseline
 
+Features
 
(300 sentences)
 
 
To summarize
 
(300 sentences)
 
Combined results
Baseline
+OOV
+Pooling
 
 
Vocab. expansion results
 
Baseline
 
(300 sentences)
 
+OOV
+Pooling
 
What about…
 
Domain adaptation?
 
(no significant gain)
French?
 
(no significant gain)
Other kinds of embeddings?
 
(no significant gain)
 
Why didn’t it work?
 
Context clues often provide enough information
to reason around words with incomplete /
incorrect statistics
Parser already has a robust OOV, small count
models
Sometimes “help” from embeddings is worse
than nothing:
 bifurcate
  
 
   
Soap
 homered 
  
 
   
Paschi
 tuning
   
 
   
unrecognized
 
What about other parsers?
 
Dependency parsers
(continuous repr. as syntactic abstraction)
 
Neural networks
(continuous repr. as structural requirement)
[Henderson 2004, Socher 2013]
[Henderson 2004, Socher 2013, Koo 2008, Bansal 2014]
 
What didn’t we try?
Hard clustering
(some evidence that this is useful for
morphologically rich languages)
A nonlinear feature-based model
Embeddings in higher constituents
(e.g. in a CRF parser)
[Candito 09]
 
 
Conclusion
 
Embeddings provide no apparent benefit to
state-of-the-art parser for:
OOV handling
Parameter pooling
Lexicon features
 
Code online at 
http://cs.berkeley.edu/~jda
Slide Note
Embed
Share

Word embeddings play a crucial role in natural language processing, offering insights into syntax encoding. Jacob Andreas and Dan Klein from UC Berkeley delve into the impact of embeddings on various linguistic aspects like vocabulary expansion and statistic pooling. Through different hypotheses, they showcase how embeddings aid in handling out-of-vocabulary words and medium-frequency words, ultimately enhancing features like tense and transitivity.

  • Word Embeddings
  • Syntax Encoding
  • Natural Language Processing
  • UC Berkeley

Uploaded on Oct 02, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

  2. Everybody loves word embeddings fewmost that the aeach every this [Collobert 2011] [Collobert 2011, Mikolov 2013, Freitag 2004, Schuetze 1995, Turian 2010]

  3. What might embeddings bring? Mary Cathleen complained about the magazine s shoddy editorial quality . average executive

  4. Three hypotheses Cathleen Vocabulary expansion (good for OOV words) Mary Statistic pooling (good for medium-frequency words) average editorial executive Embedding structure (good for features) tense transitivity

  5. Vocabulary expansion: Embeddings help handling of out-of-vocabulary words Cathleen Mary

  6. Vocabulary expansion Cathleen yellow Mary John enormous Pierre hungry

  7. Vocabulary expansion Mary Cathleen complained about the magazine s shoddy editorial quality. Cathleen yellow Mary John enormous Pierre hungry

  8. Vocab. expansion results 100 95 91.22 91.13 90 85 80 75 70 65 60 +OOV Baseline

  9. Vocab. expansion results 75 (300 sentences) 74 73 72.20 71.88 72 71 70 +OOV Baseline

  10. Statistic pooling hypothesis: Embeddings help handling of medium-frequency words average editorial executive

  11. Statistic pooling {NN} {NN, JJ} editorial {JJ} executive giant {NN} kind {NN, JJ} average

  12. Statistic pooling {NN, JJ} {NN, JJ} editorial {JJ, NN} executive giant {NN, JJ} kind {NN, JJ} average

  13. Statistic pooling editorial NN {NN} {NN, JJ} editorial {JJ} executive giant {NN} kind {NN, JJ} average editorial NN

  14. Statistic pooling results 100 95 91.13 91.11 90 85 80 75 70 65 60 +Pooling Baseline

  15. Vocab. expansion results 75 (300 sentences) 74 73 72.21 71.88 72 71 70 +Pooling Baseline

  16. Embedding structure hypothesis: The organization of the embedding space directly encodes useful features tense transitivity

  17. Embedding structure transitivity vanishing dining dined vanished tense devoured devouring assassinated assassinating dined VBD dined VBD [Huang 2011]

  18. Embedding structure results 100 95 91.13 91.08 90 85 80 75 70 65 60 +Features Baseline

  19. Embedding structure results 75 (300 sentences) 74 73 71.88 72 71 70.32 70 +Features Baseline

  20. To summarize 100 Baseline 95 +OOV 90 +Pooling +Features 85 80 (300 sentences) 75 70 65 60

  21. Combined results 100 95 90.70 90.11 90 85 80 75 70 65 60 +OOV +Pooling Baseline

  22. Vocab. expansion results 75 (300 sentences) 74 73 72.21 71.88 72 71 70 +OOV +Pooling Baseline

  23. What about Domain adaptation? (no significant gain) French? (no significant gain) Other kinds of embeddings? (no significant gain)

  24. Why didnt it work? Context clues often provide enough information to reason around words with incomplete / incorrect statistics Parser already has a robust OOV, small count models Sometimes help from embeddings is worse than nothing: bifurcate homered tuning Soap Paschi unrecognized

  25. What about other parsers? Dependency parsers (continuous repr. as syntactic abstraction) Neural networks (continuous repr. as structural requirement) [Henderson 2004, Socher 2013] [Henderson 2004, Socher 2013, Koo 2008, Bansal 2014]

  26. Conclusion Embeddings provide no apparent benefit to state-of-the-art parser for: OOV handling Parameter pooling Lexicon features Code online at http://cs.berkeley.edu/~jda

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#