Neural Shift-Reduce Dependency Parsing in Natural Language Processing
This content explores the concept of Shift-Reduce Dependency Parsing in the context of Natural Language Processing. It describes how a Shift-Reduce Parser incrementally builds a parse without backtracking, maintaining a buffer of input words and a stack of constructed constituents. The process involves shifting words onto the stack and reducing elements to form constituents in a deterministic manner. A sample parse of the sentence "Bob eats pasta" is used to illustrate the actions of shifting and reducing constituents within the parser to generate a parse tree.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
CS 388: Natural Language Processing: Neural Shift-Reduce Dependency Parsing Raymond J. Mooney University of Texas at Austin 1
Shift Reduce Parser Deterministically builds a parse incrementally, bottom up, and left to right, without backtracking. Maintains buffer of input words and a stack of constructed constituents. Perform sequence of operations/actions: Shift: Push the next word in the buffer onto the stack. Reduce: Replace a set of the top elements on the stack with a constituent composed of them. 2
Sample Parse of Bob eats pasta Buffer: Bob eats pasta Stack 3
Sample Parse of Bob eats pasta Action: Shift Buffer: eats pasta Stack Bob 4
Sample Parse of Bob eats pasta Action: Reduce(Bob NP) Buffer: eats pasta Stack (NP Bob) 5
Sample Parse of Bob eats pasta Action: Shift Buffer: pasta Stack eats (NP Bob) 6
Sample Parse of Bob eats pasta Action: Reduce(eats VB) Buffer: pasta Stack (VB eats) (NP Bob) 7
Sample Parse of Bob eats pasta Action: Shift Buffer: Stack pasta (VB eats) (NP Bob) 8
Sample Parse of Bob eats pasta Action: Reduce(pasta NP) Buffer: Stack (NP pasta) (VB eats) (NP Bob) 9
Sample Parse of Bob eats pasta Action: Reduce(VB NP VP) Buffer: Stack (VP (VB eats)(NP pasta)) (NP Bob) 10
Sample Parse of Bob eats pasta Action: Reduce(S NP VP) Buffer: Stack (S (NP Bob) (VP (VB eats)(NP pasta))) 11
Shift Reduce Parsing Must use look ahead to use next words in the buffer to pick the correct action. Originally introduced to parse programming languages which are DCFLs. Use for NLP requires heuristics to pick an action at each step which (due to ambiguity) could be wrong, resulting in a garden path. Can perform backup when an impasse is reached in order to search for a parse. 12
Shift-Reduce Dependency Parser Easily adapted to dependency parsing by using reduce operators that introduce dependency arcs. In addition to a stack and buffer, maintain a set of dependency arcs created. 13
Arc-Standard System (Nivre, 2004) Buffer b = [b1, b2, bn] Stack s = [s1, s2, sm] Arcs A = {label(wi, wj), } Configuration c = (s, b, A) Initial Config: ([ROOT], [w1, w2, wn], {}) Final Config: ([ROOT], [],{label(wi, wj), }) 14
Sample Parse of He has good control Stack ROOT Arcs Buffer: [He, has, good, control] 16
Sample Parse of He has good control Action: Shift Stack He Arcs Buffer: [has, good, control] ROOT 17
Sample Parse of He has good control Action: Shift Stack has Arcs Buffer: [good, control] He ROOT 18
Sample Parse of He has good control Action: LeftArc(nsubj) Stack has Arcs Buffer: [good, control] nsubj(has,He) ROOT 19
Sample Parse of He has good control Action: Shift Stack good Arcs nsubj(has,He) Buffer: [control] has ROOT 20
Sample Parse of He has good control Action: Shift Stack control Arcs Buffer: [] nsubj(has,He) good has ROOT 21
Sample Parse of He has good control Action: LeftArc(amod) Stack control Arcs Buffer: [] nsubj(has,He) amod(control,good) has ROOT 22
Sample Parse of He has good control Action: RightArc(dobj) Stack has Arcs Buffer: [] nsubj(has,He) amod(control,good) dobj(has,control) ROOT 23
Sample Parse of He has good control Action: RightArc(root) Stack ROOT Arcs Buffer: [] nsubj(has,He) amod(control,good) dobj(has,control) root(ROOT,has) 24
Stanford Neural Dependency Parser (Chen and Manning, 2014) Train a neural net to choose the best shift- reduce parser action to take at each step. Uses features (words, POS tags, arc labels) extracted from the current stack, buffer, and arcs as context. History (thru citation trail): Neural shift-reduce parser (Mayberry & Miikkulainen, 1999) Decision-tree shift-reduce parser (Hermjakob & Mooney, 1997) Simple learned shift-reduce parser (Simmons & Yu, 1992) 25
Neural Architecture Parse action classification 26
Context Features Used (rc = right-child, lc=left-child) The top 3 words on the stack and buffer: s1; s2; s3; b1; b2; b3; The first and second leftmost / rightmost children of the top two words on the stack: lc1(si); rc1(si); lc2(si); rc2 (si), i = 1; 2. The leftmost-of-leftmost and rightmost-of- rightmost children of the top two words on the stack: lc1(lc1(si)); rc1(rc1(si)), i = 1; 2. Also include the POS tag and parent arc label (where available) for these same items. 27
Input Embeddings Instead of using one-hot input encodings, words and POS tags are embedded in a 50 dimensional set of input features. Embedding POS tags is unusual since there are relatively few; however, it allows similar tags (e.g. NN and NNS) to have similar embeddings and thereby behave similarly. 28
Cube Activation Function Alternative non-linear output function instead of sigmoid (softmax) or tanh. Allows modeling the product terms of xixjxk for any three different input elements. Based on previous empirical results, capturing interactions of three elements seems important for shift-reduce dependency parsing. 29
Training Data Automatically construct dependency parses from treebank phrase-structure parse trees. Compute correct sequence of oracle shift- reduce parse actions (transitions, ti) at each step from gold-standard parse trees. Determine correct parse sequence by using a shortest stack oracle which always prefers LeftArc over Shift. 30
Training Algorithm Training objective is to minimize the cross- entropy loss, plus a L2-regularization term: Initialize word embeddings to precomputed values such as Word2Vec. Use AdaGrad with dropout to compute model parameters that approximately minimize this objective. 31
Evaluation Metrics for Dependency Parsing Unlabeled Atachment Score (UAS): % of tokens for which a system has predicted the correct parent. Labeled Atachment Score (LAS): % of tokens for which a system has predicted the correct parent with the correct arc label. 32
Conclusions Shift-reduce parsing is an efficient and effective alternative to standard PCFG parsing. Particularly effective for dependency parsing. Models deterministic, left-to-right parsing that seems to characterize human parsing (therefore subject to garden paths). Neural methods to select parse operations give state-of-the-art results. 34