Understanding Discourse Coherence and Annotation in PDTB
NLP research on discourse coherence explores relations between events and propositions expressed in text, with a focus on combining individual relations into complex coherence structures. The PDTB approach annotates low-level relations in corpora to derive emergent high-level structural representations. This involves identifying discourse relations, arguments, and sense classifications, leading to a deeper understanding of text coherence.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Discourse Annotation in the PDTB Discourse Annotation in the PDTB- -3: The Next Generation The Next Generation 3: Rashmi Prasad, Bonnie Webber, Alan Lee *Aravind Joshi
Outline Introduction Discourse Coherence and its annotation PDTB Basics PDTB enrichment Motivation New relations Sense revisions Guidelines revisions Mapping to ISO-DR-Core Conclusion Separate effort for (really) full-text annotation
Discourse Coherence and its Annotation Starting from the 80s, NLP has seen work on discourse coherence as a function of relations between eventualities and propositions (typically realized as clauses, sentences, or larger segments of text). Relations can be expressed explicity or implicitly. E.g., Relation of CAUSE: John did not eat the fish because he is vegetarian. John did not eat the fish. That s because he is vegetarian. John did not eat the fish. He is vegetarian. Being vegetarian, John did not eat the fish. Some work aims to combine individual relations into more complex coherence structures spanning the entirety of a given text E.g., RST, SDRT PDTB Annotation of only low-level individual relations, without combining them any further. Why? Jury still out on high-level structural representation trees? DAGs? Unconstrained graphs? PDTB approach to high-level structure is an empirical one: Emergent high-level structural representation from low-level discourse relation annotation in corpora. PDTB-2 (Prasad et al., 2008), annotated over WSJ, 40600 relations, released in 2008
Outline Introduction Discourse Coherence and its annotation PDTB Basics PDTB enrichment Motivation New relations Sense revisions Guidelines revisions Mapping to ISO-DR-Core Conclusion Separate effort for (really) full-text annotation
PDTB Annotation Basics GUIDELINES Text (Discourse) Definitions for identifying discourse relations (explicit/implicit) and arguments John did not eat the fish because he is vegetarian Identify individual relations, their explicit realization (if any) and their (two) arguments John did not eat the fish because he is vegetarian. Label arguments (Arg1/Arg2) and the sense of the relation Arg naming convention John did not eat the fish because he is vegetarian. Arg1 Contingency.Cause.Reason Arg2 Sense Classification (as hierarchy)
Outline Introduction Discourse Coherence and its annotation PDTB Basics PDTB enrichment Motivation New relations Sense revisions Guidelines revisions Mapping to ISO-DR-Core Conclusion Separate effort for (really) full-text annotation
PDTB Enrichment Limitations of PDTB-2 All relations in a text not annotated (our own awareness and feedback from community) Because lexicalized discourse relations and low-level annotation was being done for the first time on a large scale in a limited time, guidelines need improvement to be more reliable and comprehensive PDTB-3 Addresses some major gaps in the corpus, primarily intra-sentential relations ~ 13K new relations Modifications and extensions to guidelines to make them more reliable and comprehensive Application of revised guidelines to PDTB2 Merging of PDTB-2 and new relations PDTB-3 (~53K relations)
Outline Introduction Discourse Coherence and its annotation PDTB Basics PDTB enrichment Motivation New relations Sense revisions Guidelines revisions Mapping to ISO-DR-Core Conclusion Separate effort for (really) full-text annotation
New Relations Limited scope of PDTB-2 because of guidelines: Guidelines limited annotation to explicit relations lexicalized by discourse connectives, and implicit relations between paragraph-internal adjacent sentences and between (semi-) colon separated clauses within sentences. Discourse connectives were drawn from the pre-defined syntactic classes subordinating conjunctions, coordinating conjunctions, and discourse adverbials. Strict constraints on realization of arguments: With a few exceptions, arguments had to be realized as one or more clauses or sentences.
New Relations But the constraints precluded many types of intra-sentential relations Precluded subordinate clauses that can occur without lexical subordinators while bearing an implicit relation to their matrix clause. Explicit relations lexicalized by discourse connectives, and implicit relations between paragraph- internal adjacent sentences and between (semi-) colon separated clauses within sentences. Free adjuncts Treasurys opened lower, Implicit=as a result of reacting negatively to news that the producer price index a measure of inflation on the wholesale level accelerated in September. (CONTINGENCY.CAUSE.REASON) Discourse connectives were drawn from the pre-defined syntactic classes Free to-infinitives Banks need a competitive edge Implicit=if (they are) to sell their products. (CONTINGENCY.CONDITION.ARG2-AS-CONDITION) Arguments had to be realized as one or more clauses or sentences.
New Relations But the constraints precluded many types of intra-sentential relations Precluded relations triggered by prepositional subordinators like for, by, in, with, instead of, etc., that can complementize for clauses. Explicit relations lexicalized by discourse connectives, and implicit relations between paragraph- internal adjacent sentences and between (semi-) colon separated clauses within sentences. But with foreign companies snapping up U.S. movie studios, the networks are pressing their fight harder than ever. (CONTINGENCY.CAUSE.REASON) Discourse connectives were drawn from the pre-defined syntactic classes But on reflection, Mr. Oka says, he concluded that Nissan is being prudent in following its slow-startup strategy instead of simply copying Lexus. (EXPANSION.SUBSTITUTION.ARG1-AS-SUBST) Arguments had to be realized as one or more clauses or sentences.
New Relations But the constraints precluded many types of intra-sentential relations Precluded relations between conjoined verb phrases (Webber et al., 2016). Explicit relations lexicalized by discourse connectives, and implicit relations between paragraph- internal adjacent sentences and between (semi-) colon separated clauses within sentences. Exceptions allowed VPs to be arguments of connectives She became an abortionist accidentally, and continued because it enabled her to buy jam, cocoa and other war rationed goodies. (CONTINGENCY.CAUSE.REASON) Discourse connectives were drawn from the pre-defined syntactic classes but not of the VP conjunction itself. She became an abortionist accidentally, and continued because it enabled her to buy jam, cocoa and other war rationed goodies. (EXPANSION.CONJUNCTION) Arguments had to be realized as one or more clauses or sentences. Stocks closed higher in Hong Kong, Manila, Singapore, Sydney and Wellington, but were lower in Seoul. (COMPARISON.CONTRAST)
New Relations and Linking Webber et al. (2016): Arguments of certain explicit relations, particularly CONJUNCTIONS, can also be related by an additional implicit relation: She became an abortionist accidentally, and continued because it enabled her to buy jam, cocoa and other war rationed goodies. (EXPANSION.CONJUNCTION) She became an abortionist accidentally, Implicit=then and continued because it enabled her to buy jam, cocoa and other war-rationed goodies. (TEMPORAL.ASYNCHRONOUS.PRECEDENCE) In PDTB-3: Multiple relations holding between the same two arguments are LINKED in the underlying representation. Linking can involve mutliple explicit relations, multiple implicit relations, or an explicit and implicit relation.
New PDTB-3 Relations: Distribution VP conjunctions account for about half of the total, but about 20% of these are implicit relations inferred in addition to the explicit conjunction S Conjunction Implicits: A consequence of our finding that additional implicit inferences can be associated with intra-sentential S conjunctions already annotated in PDTB-2 For PDTB-3, all S conjunction relations in PDTB-2 were revisited and reconsidered for these additional inferences 32% of the discourse relations associated with S conjunctions are additional implicit inferences
Outline Introduction Discourse Coherence and its annotation PDTB Basics PDTB enrichment Motivation New relations Sense revisions Guidelines revisions Mapping to ISO-DR-Core Conclusion Separate effort for (really) full-text annotation
PDTB3 Sense Hierarchy Simplifications: senses at Level-3 now only encode directionality of the arguments, and so only appear with asymmetric Level-2 senses
PDTB3 Sense Hierarchy Annotating intra-sentential discourse relations revealed asymmetric Level-2 senses for which the relation s arguments occur in either order (rather than the single order assumed in the PDTB-2).
PDTB3 Sense Hierarchy Simplifications: Level-2 pragmatic senses have been removed from the hierarchy and replaced with features that can be attached to a relation token to indicate an inference of implicit belief or of a speech act associated with arguments.
PDTB3 Sense Hierarchy Augmentations: New senses have been introduced on an as needed basis
Hypophora as a New Relation Type There are many pairs in the corpus where the first sentence (Arg1) expresses a question seeking some information, and the second (Arg2) provides a response to fulfil that need. These relations cannot be instantiated with connectives, explicitly or implicitly. If not now, when? When the fruit is ripe, it falls from the tree by itself, he says. Of all the ethnic tensions in America, which is the most troublesome right now? A good bet would be the tension between blacks and Jews in New York City.
Hypophora as new Relation Type So can a magazine survive by downright thumbing its nose at major advertisers? Garbage magazine, billed as The Practical Journal for the Environment, is about to find out. The response to the question can answer the information implicitly With all this, can stock prices hold their own? The question is unanswerable at this point she says. And the answer can also indicate that the information need cannot be fulfilled In PDTB-3, these QA pairs are marked as a NEW relation type, called HYPOPHORA, because these relations involve dialogue acts (Bunt et al., 2017), which are treated as distinct from discourse relations in PDTB, and because they are uninstantiable as connectives HYPOPHORA does not apply when the subsequent text relates to a question in other ways for example, with rhetorical questions that are posed for dramatic effect or to make an assertion, rather than to elicit an answer, or if the subsequent text provides an explanation for why the question has been asked What s wrong with asking for more money? Implicit=because Money is not everything, but it is necessary, and business is not volunteer work. (CONTINGENCY.CAUSE.REASON+BELIEF) What sector is stepping forward to pick up the slack? he asked. Implicit=because I draw a blank. (CONTINGENCY.CAUSE.REASON+SPEECH-ACT)
Outline Introduction PDTB enrichment Motivation New relations Sense revisions Guidelines revisions Changes to argument labeling convention Extensions to AltLex Identification Mapping to ISO-DR-Core Conclusion Separate effort for (really) full-text annotation
PDTB-2 Syntax-based Argument Labeling Convention Reference to realization type, syntactic attachment and linear order: Explicit: Arg2 was the argument to which the connective was attached syntactically; the other argument was Arg1. Implicit: Arg1 was always the first (lefthand) span; Arg2, the adjacent (righthand) span. Abstraction over variation in argument order, in combination with sense semantics, to provide consistency in relation semantics across all variants: Subordinating Conjunctions: John ate the fish even though he is a vegetarian. Even though John is a vegetarian, he ate the fish. Denying span of Concession is the same across variants CONCESSION.CONTRA-EXPECTATION (Arg1 denies) Coordinating conjunctions: John is a vegetarian but he ate the fish. Discourse adverbials: John is a vegetarian. Nevertheless, he ate the fish. Implicit Relations (Impl conn, AltLex, EntRel, NoRel): John is a vegetarian. Despite that, he ate the fish. CONCESSION.EXPECTATION (Arg2 denies)
Inconsistencies with PDTB-2 Arg Labeling 1. Variability in where an explicit connective can attach within a sentence Japan not only outstrips the U.S. in investment flows but also outranks it in trade with most Southeast Asian countries The hacker was pawing over the Berkeley files but also using Berkeley and other easily accessible computers as stepping stones Not only did Mr. Ortega s comments come in the midst of what was intended as a showcase for the region, it came as Nicaragua is under special international scrutiny 2. Ability of marked syntax to replace explicit connectives. Had the contest gone a full seven games, ABC could have reaped an extra $10 million in ad sales . . . they probably would have gotten away with it, had they not felt compelled to add Ms. Collins s signature tune, Amazing Grace,
PDTB-3 Syntax-based Argument Labeling Convention More fine-grained reference to syntactic structure, regardless of realization type. Avoids inconsistencies, while not requiring any change to existing labels in PDTB-2. Arguments to inter-sentential discourse relations remain labeled by position: Arg1 is first (lefthand) argument and Arg2, the second (righthand) argument. Arguments of intra-sentential coordinating structures are also labeled by position: Arg1 is the first conjunct and Arg2, the second conjunct. With intra-sentential subordinating structures, Arg1 and Arg2 are determined syntactically. The subordinate structure is always labeled Arg2, and the structure to which it is subordinate is labeled Arg1.
Outline Introduction PDTB enrichment Motivation New relations Sense revisions Guidelines revisions Changes to argument labeling convention Extensions to AltLex Identification Mapping to ISO-DR-Core Conclusion Separate effort for (really) full-text annotation
Extensions to AltLex Identification AltLex: In the absence of an explicit connective, if annotators inferred a relation between the sentences but felt that the insertion of a implicit connective would be redundant, they were asked to identify the non-connective expression in Arg2 that they took as the source of the perceived redundancy as the AltLex. (1) Allowance to include material for the AltLex expression from both Arg1 and Arg2. Some of the proposals are so closethat non-financial issues such as timing may play a more important role. (CONTINGENCY.CAUSE.RESULT) Things have gone too far for the government to stop them now. (CONTINGENCY.CAUSE.RESULT)
Extensions to AltLex Identification AltLex: In the absence of an explicit connective, if annotators inferred a relation between the sentences but felt that the insertion of a implicit connective would be redundant, they were asked to identify the non-connective expression in Arg2 that they took as the source of the perceived redundancy as the AltLex. (2) Allowance to represent the expression of discourse relations with syntactic constructions. Crude as they were, these early PCs triggered explosive product development in desktop models for the home and office. (COMPARISON.CONCESSION.ARG1-AS-DENIER) Predicate Inversion Had the contest gone a full seven games, ABC could have reaped an extra $10 million in ad sales on the seventh game alone, compared with the ad take it would have received for regular prime-time shows. (CONTINGENCY.CONDITION.ARG2-AS-CONDITION) AUX Inversion
Outline Introduction PDTB enrichment Motivation New relations Sense revisions Guidelines revisions Changes to argument labeling convention Extensions to AltLex Identification Mapping to ISO-DR-Core Conclusion Separate effort for (really) full-text annotation
Mapping to ISO-DR-Core ISO 247617-8 Effort to develop an international standard for the annotation of discourse relations. Provide clear and mutually consistent definitions of a set of core discourse relations (senses) ISO-DR-Core Provide mappings from ISO-DR-Core relations to relations in different frameworks, including the PDTB. (Bunt and Prasad, 2016)
Mapping to ISO-DR-Core Is the modified PDTB sense hierarchy mappable to the ISO-DR-Core relations? New senses with 1:1 mapping PURPOSE NEGATIVE CONDITION SIMILARITY MANNER New senses that do not have a correlate ARG2-AS-NEGGOAL (under Level-2 PURPOSE) NEGATIVE RESULT (under Level-2 CAUSE) Like the negative counterpart of condition, ISO-DR-Core should be extended to include the negative counterpart for CAUSE and PURPOSE. However, it remains an open question whether these relations should be defined in a way that captures both argument directionalities. In PDTB, no evidence yet for the reverse directionality for these senses.
Mapping to ISO-DR-Core We still have not covered the conceptual space for discourse relations. Desirable approach: characterize ontology by considering semantic possibilities, with a language-independent approach, and a corpus-independent approach
Conclusion: Corpus Release and Consistency PDTB-3 expected to be distributed in Fall 2018, through Linguistic Data Consortium (http://www.ldc.upenn.edu) Corpus (LDC) Manual/guidelines and tools (LDC and PDTB website, http://www.seas.upenn.edu/~pdtb) Annotation Quality: Annotation Adjudication Additional consistency checking in PDTB-3 Merge of PDTB-3 and PDTB-2
Full-Text Annotation of Discourse Relations A separate effort (Prasad et al. (2017)) from the PDTB-3: Annotation of cross-paragraph implicit relations that are not annotated in either PDTB-2 or PDTB-3. When merged with PDTB-3 Full-text annotation of discourse relations Done over 145 texts (Sections 01, 06, and 23 of the corpus) However, annotation guidelines developed for the cross-paragraph annotation led to some departures from PDTB-2/3 guidelines in ways not incorporated in PDTB-3 To be distributed to the community separately, via github (https://github.com/pdtb-upenn/full-text).
Thank You. Questions?