Discourse Coherence and Annotation in PDTB

D
i
s
c
o
u
r
s
e
 
A
n
n
o
t
a
t
i
o
n
 
i
n
 
t
h
e
 
P
D
T
B
-
3
:
T
h
e
 
N
e
x
t
 
G
e
n
e
r
a
t
i
o
n
Rashmi Prasad, Bonnie Webber, Alan Lee
*Aravind Joshi
Outline
Introduction
Discourse Coherence and its annotation
PDTB Basics
PDTB enrichment
Motivation
New relations
Sense revisions
Guidelines revisions
Mapping to ISO-DR-Core
Conclusion
Separate effort for (really) full-text annotation
Discourse Coherence and its Annotation
Starting from the 80s, NLP has seen work on discourse coherence as a function of relations between eventualities and
propositions (typically realized as clauses, sentences, or larger segments of text).
Relations can be expressed 
explicity
 or 
implicitly
.
E.g., Relation of CAUSE:
John did not eat the fish 
because
 
he is vegetarian.
John did not eat the fish
. 
That’s because 
he is vegetarian.
John did not eat the fish
. 
 
He is vegetarian.
Being vegetarian, 
John did not eat the fish.
Some work aims to combine individual relations into more complex coherence structures spanning the entirety of a
given text 
 E.g., RST, SDRT
PDTB
 
 Annotation of only low-level individual relations, without combining them any further. Why?
Jury still out on high-level structural representation 
 trees? DAGs? Unconstrained graphs?
PDTB approach to high-level structure is an empirical one: Emergent high-level structural representation from
low-level discourse relation annotation in corpora.
PDTB-2
 (Prasad et al., 2008), annotated over WSJ, 40600 relations, released in 2008
Outline
Introduction
Discourse Coherence and its annotation
PDTB Basics
PDTB enrichment
Motivation
New relations
Sense revisions
Guidelines revisions
Mapping to ISO-DR-Core
Conclusion
Separate effort for (really) full-text annotation
PDTB Annotation Basics
 
   Text (Discourse)
John did not eat the fish because he is vegetarian
Identify individual relations, their explicit realization (if any) and their (two) arguments
 
John did not eat the fish
 
because
 
he is vegetarian
.
Label arguments (Arg1/Arg2) and the sense of the relation
John did not eat the fish
 
because
 
he is vegetarian
.
Arg1
Arg2
Contingency.Cause.Reason
Arg naming convention
Sense Classification
(as hierarchy)
Definitions for
identifying discourse
relations
(explicit/implicit) and
arguments
GUIDELINES
PDTB2 Sense Hierarchy
Outline
Introduction
Discourse Coherence and its annotation
PDTB Basics
PDTB enrichment
Motivation
New relations
Sense revisions
Guidelines revisions
Mapping to ISO-DR-Core
Conclusion
Separate effort for (really) full-text annotation
PDTB Enrichment
Limitations of PDTB-2
All relations in a text not annotated (our own awareness and feedback from community)
Because lexicalized discourse relations and low-level annotation was being done for the first time on
a large scale in a limited time, guidelines need improvement to be more reliable and comprehensive
PDTB-3
Addresses some major gaps in the corpus, primarily intra-sentential relations 
 ~ 13K new relations
Modifications and extensions to guidelines to make them more reliable and comprehensive
Application of revised guidelines to PDTB2
Merging of PDTB-2 and new relations 
 
PDTB-3 
(~53K relations)
Outline
Introduction
Discourse Coherence and its annotation
PDTB Basics
PDTB enrichment
Motivation
New relations
Sense revisions
Guidelines revisions
Mapping to ISO-DR-Core
Conclusion
Separate effort for (really) full-text annotation
New Relations
Limited scope of PDTB-2 because of guidelines:
Guidelines limited annotation to
explicit relations lexicalized by discourse connectives, and
implicit relations between paragraph-internal adjacent sentences and between (semi-) colon
 
separated clauses within sentences.
Discourse connectives were drawn from the pre-defined syntactic classes
subordinating conjunctions, coordinating conjunctions, and discourse adverbials.
Strict constraints on realization of arguments:
With a few exceptions, arguments had to be realized as one or more clauses or sentences.
New Relations
But the constraints precluded many types of intra-sentential relations
Explicit relations lexicalized by
discourse connectives, and implicit
relations between paragraph-
internal adjacent sentences and
between (semi-) colon separated
clauses within sentences.
Discourse connectives were drawn
from the pre-defined syntactic
classes
Arguments had to be realized as one
or more clauses or sentences.
Precluded 
subordinate clauses that can occur without
lexical subordinators
 while bearing an implicit relation
to their matrix clause.
Free adjuncts
Treasurys opened lower
, 
Implicit=as a result of
reacting negatively to news that the producer price
index – a measure of inflation on the wholesale level
– accelerated in September
.
(CONTINGENCY.CAUSE.REASON)
Free to-infinitives
Banks need a competitive edge 
Implicit=if (they are)
to sell their products
.
     (CONTINGENCY.CONDITION.ARG2-AS-CONDITION)
New Relations
But the constraints precluded many types of intra-sentential relations
Explicit relations lexicalized by
discourse connectives, and implicit
relations between paragraph-
internal adjacent sentences and
between (semi-) colon separated
clauses within sentences.
Discourse connectives were drawn
from the pre-defined syntactic
classes
Arguments had to be realized as one
or more clauses or sentences.
Precluded relations triggered by 
prepositional
subordinators
 like 
for, by, in, with, instead of
, etc., that
can complementize for clauses.
But
 
with
 
foreign companies snapping up U.S. movie
studios
, 
the networks are pressing their fight harder
than ever
.
     (CONTINGENCY.CAUSE.REASON)
But on reflection, Mr. Oka says, he concluded that
Nissan is being prudent 
in following its slow-startup
strategy
 
instead of
 
simply copying Lexus
.
(EXPANSION.SUBSTITUTION.ARG1-AS-SUBST)
New Relations
But the constraints precluded many types of intra-sentential relations
Explicit relations lexicalized by
discourse connectives, and implicit
relations between paragraph-
internal adjacent sentences and
between (semi-) colon separated
clauses within sentences.
Discourse connectives were drawn
from the pre-defined syntactic
classes
Arguments had to be realized as one
or more clauses or sentences.
Precluded relations between 
conjoined verb phrases
 (Webber
et al., 2016).
Exceptions allowed VPs to be arguments of connectives
She became an abortionist accidentally, 
and continued
 
because
 
it
enabled her to buy jam, cocoa and other war rationed goodies
.
(CONTINGENCY.CAUSE.REASON)
but not of the  VP conjunction itself.
She 
became an abortionist accidentally
, 
and
 
continued because it
enabled her to buy jam, cocoa and other war rationed goodies
.
(EXPANSION.CONJUNCTION)
Stocks 
closed higher in Hong Kong, Manila, Singapore, Sydney and
Wellington
, 
but
 
were lower in Seoul
. (COMPARISON.CONTRAST)
New Relations and Linking
Webber et al. (2016): Arguments of certain explicit relations, particularly
CONJUNCTIONS, can also be related by an additional implicit relation:
She 
became an abortionist accidentally
, 
and
 
continued because it enabled her to
buy jam, cocoa and other war rationed goodies
.  (EXPANSION.CONJUNCTION)
She 
became an abortionist accidentally
, 
Implicit=then
 
and continued because it
enabled her to buy jam, cocoa and other war-rationed goodies
.
(TEMPORAL.ASYNCHRONOUS.PRECEDENCE)
In PDTB-3: Multiple relations holding between the same two arguments are 
LINKED
 
in
the underlying representation.
Linking can involve mutliple explicit relations, multiple implicit relations, or an explicit
and implicit relation.
New PDTB-3 Relations: Distribution
VP conjunctions 
account for about half of the total, but about 20% of these are implicit relations inferred in
addition to the explicit conjunction
S Conjunction Implicits
:  A consequence of our finding that additional implicit inferences can be associated with
intra-sentential S conjunctions already annotated in PDTB-2
For PDTB-3, all S conjunction relations in PDTB-2 were revisited and reconsidered for these additional
inferences
32% of the discourse relations associated with S conjunctions are additional implicit inferences
Outline
Introduction
Discourse Coherence and its annotation
PDTB Basics
PDTB enrichment
Motivation
New relations
Sense revisions
Guidelines revisions
Mapping to ISO-DR-Core
Conclusion
Separate effort for (really) full-text annotation
PDTB2 Sense Hierarchy
PDTB3 Sense Hierarchy
Simplifications: 
senses at Level-3 now only encode directionality of the arguments, and
so only appear with asymmetric Level-2 senses
PDTB3 Sense Hierarchy
Annotating intra-sentential discourse relations revealed asymmetric Level-2 senses for
which the relation’s arguments occur in either order (rather than the single order
assumed in the PDTB-2).
PDTB3 Sense Hierarchy
Simplifications: 
Level-2 pragmatic senses have been removed from the hierarchy and
replaced with features that can be attached to a relation token to indicate an inference
of implicit belief or of a speech act associated with arguments.
PDTB3 Sense Hierarchy
Augmentations: 
New senses have been introduced on an “as needed” basis
Hypophora as a New Relation Type
There are many pairs in the corpus where the first sentence (Arg1) expresses a question
seeking some information,  and the second (Arg2) provides a response to fulfil that
need.
These relations cannot be instantiated with connectives, explicitly or implicitly.
If not now, when
? 
“When the fruit is ripe, it falls from the tree by itself,” 
he says.”
Of all the ethnic tensions in America, which is the most troublesome right now
? 
A
good bet would be the tension between blacks and Jews in New York City
.
Hypophora as new Relation Type
The response to the question can answer the information implicitly
So can a magazine survive by downright thumbing its nose at major advertisers
? 
Garbage magazine, billed as ”The
Practical Journal for the Environment,” is about to find out
.
 And the answer can also indicate that the information need cannot be fulfilled
With all this, can stock prices hold their own
? ”
The question is unanswerable at this point
” she says.
In PDTB-3, these QA pairs are marked as a NEW relation type, called HYPOPHORA, because these relations
involve dialogue acts (Bunt et al., 2017), which are treated as distinct from discourse relations in PDTB, and
because they are uninstantiable as connectives
HYPOPHORA does not apply when the subsequent text relates to a question in other ways – for example, with
rhetorical questions that are posed for dramatic effect or to make an assertion, rather than to elicit an answer, or if
the subsequent text provides an explanation for why the question has been asked
What’s wrong with asking for more money
? 
Implicit=because
 
Money is not everything, but it is necessary, and
business is not volunteer work
. (CONTINGENCY.CAUSE.REASON+BELIEF)
What sector is stepping forward to pick up the slack
?” he asked. 
Implicit=because
I draw a blank
.”
(CONTINGENCY.CAUSE.REASON+SPEECH-ACT)
Outline
Introduction
PDTB enrichment
Motivation
New relations
Sense revisions
Guidelines revisions
Changes to argument labeling convention
Extensions to AltLex Identification
Mapping to ISO-DR-Core
Conclusion
Separate effort for (really) full-text annotation
PDTB-2 Syntax-based Argument Labeling Convention
Reference to realization type, syntactic attachment and linear order:
Explicit
: Arg2 was the argument to which the connective was attached syntactically; the
other argument was Arg1.
Implicit
: Arg1 was always the first (lefthand) span; Arg2, the adjacent (righthand) span.
Abstraction over variation in argument order, in combination with sense semantics, to
provide consistency in relation semantics across all variants:
Subordinating Conjunctions
:
John ate the fish 
even though
 
he is a vegetarian
.
Even though
 
John is a vegetarian
, 
he ate the fish
.
Coordinating conjunctions
:
John is a vegetarian
 
but
 
he ate the fish
.
Discourse adverbials
:
John is a vegetarian
. 
Nevertheless
, 
he ate the fish
.
Implicit Relations (Impl conn, AltLex, EntRel, NoRel)
:
John is a vegetarian
. 
Despite that
, 
he ate the fish
.
CONCESSION.CONTRA-EXPECTATION
(
Arg1
 denies)
CONCESSION.EXPECTATION 
(
Arg2
 denies)
Denying span of Concession is the same across variants
Inconsistencies with PDTB-2 Arg Labeling
1. 
Variability in where an explicit connective can attach within a sentence
Japan 
not only
 
outstrips the U.S. in investment flows 
but also
 
outranks it in trade with most Southeast
Asian countries
The hacker was pawing over the Berkeley files 
but also
 
using Berkeley and other easily accessible
computers as stepping 
stones
Not only
 
did Mr. Ortega’s comments come in the midst of what was intended as a showcase for the region,
it came as Nicaragua is under special international scrutiny
2. 
Ability of marked syntax to replace explicit connectives.
Had the contest gone a full seven games
, ABC could have reaped an extra $10 million in ad sales
. . . they probably would have gotten away with it, 
had they not felt compelled to add Ms. Collins’s
signature tune, “Amazing Grace
,”
PDTB-3 Syntax-based Argument Labeling Convention
More fine-grained reference to syntactic structure, regardless of realization type.
Avoids inconsistencies, while not requiring any change to existing labels in PDTB-2.
Arguments to 
inter-sentential discourse relations
 remain labeled by position: Arg1 is first
(lefthand) argument and Arg2, the second (righthand) argument.
Arguments of 
intra-sentential coordinating structures
 are also labeled by position: Arg1
is the first conjunct and Arg2, the second conjunct.
With 
intra-sentential subordinating structures
, Arg1 and Arg2 are determined
syntactically. The subordinate structure is always labeled Arg2, and the structure to which
it is subordinate is labeled 
Arg1.
Outline
Introduction
PDTB enrichment
Motivation
New relations
Sense revisions
Guidelines revisions
Changes to argument labeling convention
Extensions to AltLex Identification
Mapping to ISO-DR-Core
Conclusion
Separate effort for (really) full-text annotation
Extensions to AltLex Identification
AltLex
:  In the absence of an explicit connective, if annotators inferred a relation between
the sentences but felt that the insertion of a implicit connective would be redundant, they
were asked to identify the 
non-connective expression in Arg2
 that they took as the source
of the perceived redundancy as the 
AltLex
.
(1) Allowance to include material for the AltLex expression from both Arg1 and Arg2.
Some of the proposals are 
so close
 
that
 non-financial issues such as timing may play a
more important role
. 
 
 
(CONTINGENCY.CAUSE.RESULT)
Things have gone 
too far
 
for
 the government to stop them now
.
 
(CONTINGENCY.CAUSE.RESULT)
Extensions to AltLex Identification
(2) Allowance to represent the expression of discourse relations with syntactic constructions.
Crude as they were
, 
these early PCs triggered explosive product development in desktop
models for the home and office
.
 
(COMPARISON.CONCESSION.ARG1-AS-DENIER)
Had the contest gone a full seven games
, 
ABC could have reaped an extra $10 million in
ad sales on the seventh game alone, compared with the ad take it would have received
for regular prime-time shows
.
 
(CONTINGENCY.CONDITION.ARG2-AS-CONDITION)
AltLex
:  In the absence of an explicit connective, if annotators inferred a relation between
the sentences but felt that the insertion of a implicit connective would be redundant, they
were asked to identify the 
non-connective expression in Arg2
 that they took as the source
of the perceived redundancy as the 
AltLex
.
Predicate Inversion
AUX Inversion
Outline
Introduction
PDTB enrichment
Motivation
New relations
Sense revisions
Guidelines revisions
Changes to argument labeling convention
Extensions to AltLex Identification
Mapping to ISO-DR-Core
Conclusion
Separate effort for (really) full-text annotation
Mapping to ISO-DR-Core
ISO 247617-8
Effort to develop an international standard for the annotation of discourse
relations.
Provide clear and mutually consistent definitions of a set of core discourse
relations (senses) – 
ISO-DR-Core
Provide mappings from ISO-DR-Core relations to relations in different
frameworks, including the PDTB.
(Bunt and Prasad, 2016)
Mapping to ISO-DR-Core
Is the modified PDTB sense hierarchy mappable to the ISO-DR-Core relations? 
New senses with 1:1 mapping
PURPOSE
NEGATIVE CONDITION
SIMILARITY
MANNER
New senses that 
do not 
have a correlate
ARG2-AS-NEGGOAL (under Level-2 PURPOSE)
NEGATIVE RESULT (under Level-2 CAUSE)
 
Like the negative counterpart of condition, ISO-DR-Core should be extended to include the negative
counterpart for CAUSE and PURPOSE.
 
However, it remains an open question whether these relations should be defined in a way that captures
both argument directionalities. In PDTB, no evidence yet for the reverse directionality for these senses.
Mapping to ISO-DR-Core
We still have not covered the conceptual space for discourse relations.
Desirable approach: 
characterize ontology by considering semantic
possibilities
, with
a language-independent approach, and
a corpus-independent approach
Conclusion: Corpus Release and Consistency
PDTB-3 expected to be distributed in Fall 2018, through Linguistic Data
Consortium (
http://www.ldc.upenn.edu
)
Corpus (LDC)
Manual/guidelines and tools (LDC and PDTB website,
http://www.seas.upenn.edu/~pdtb
)
Annotation Quality:
Annotation 
 Adjudication 
 Additional consistency checking
in PDTB-3
Merge of PDTB-3 and PDTB-2
Full-Text Annotation of Discourse Relations
A separate effort (Prasad et al. (2017)) from the PDTB-3: 
Annotation of cross-paragraph
implicit relations that are not annotated in either PDTB-2 or PDTB-3
.
When merged with PDTB-3 
 Full-text annotation of discourse relations
Done over 145 texts (Sections 01, 06, and 23 of the corpus)
However, annotation guidelines developed for the cross-paragraph annotation led
to some departures from PDTB-2/3 guidelines in ways not incorporated in PDTB-3
To be distributed to the community separately, via github
(
https://github.com/pdtb-upenn/full-text
).
Thank You.
Questions?
Slide Note
Embed
Share

NLP research on discourse coherence explores relations between events and propositions expressed in text, with a focus on combining individual relations into complex coherence structures. The PDTB approach annotates low-level relations in corpora to derive emergent high-level structural representations. This involves identifying discourse relations, arguments, and sense classifications, leading to a deeper understanding of text coherence.

  • NLP research
  • Discourse coherence
  • PDTB annotation
  • Text analysis

Uploaded on Sep 14, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Discourse Annotation in the PDTB Discourse Annotation in the PDTB- -3: The Next Generation The Next Generation 3: Rashmi Prasad, Bonnie Webber, Alan Lee *Aravind Joshi

  2. Outline Introduction Discourse Coherence and its annotation PDTB Basics PDTB enrichment Motivation New relations Sense revisions Guidelines revisions Mapping to ISO-DR-Core Conclusion Separate effort for (really) full-text annotation

  3. Discourse Coherence and its Annotation Starting from the 80s, NLP has seen work on discourse coherence as a function of relations between eventualities and propositions (typically realized as clauses, sentences, or larger segments of text). Relations can be expressed explicity or implicitly. E.g., Relation of CAUSE: John did not eat the fish because he is vegetarian. John did not eat the fish. That s because he is vegetarian. John did not eat the fish. He is vegetarian. Being vegetarian, John did not eat the fish. Some work aims to combine individual relations into more complex coherence structures spanning the entirety of a given text E.g., RST, SDRT PDTB Annotation of only low-level individual relations, without combining them any further. Why? Jury still out on high-level structural representation trees? DAGs? Unconstrained graphs? PDTB approach to high-level structure is an empirical one: Emergent high-level structural representation from low-level discourse relation annotation in corpora. PDTB-2 (Prasad et al., 2008), annotated over WSJ, 40600 relations, released in 2008

  4. Outline Introduction Discourse Coherence and its annotation PDTB Basics PDTB enrichment Motivation New relations Sense revisions Guidelines revisions Mapping to ISO-DR-Core Conclusion Separate effort for (really) full-text annotation

  5. PDTB Annotation Basics GUIDELINES Text (Discourse) Definitions for identifying discourse relations (explicit/implicit) and arguments John did not eat the fish because he is vegetarian Identify individual relations, their explicit realization (if any) and their (two) arguments John did not eat the fish because he is vegetarian. Label arguments (Arg1/Arg2) and the sense of the relation Arg naming convention John did not eat the fish because he is vegetarian. Arg1 Contingency.Cause.Reason Arg2 Sense Classification (as hierarchy)

  6. PDTB2 Sense Hierarchy

  7. Outline Introduction Discourse Coherence and its annotation PDTB Basics PDTB enrichment Motivation New relations Sense revisions Guidelines revisions Mapping to ISO-DR-Core Conclusion Separate effort for (really) full-text annotation

  8. PDTB Enrichment Limitations of PDTB-2 All relations in a text not annotated (our own awareness and feedback from community) Because lexicalized discourse relations and low-level annotation was being done for the first time on a large scale in a limited time, guidelines need improvement to be more reliable and comprehensive PDTB-3 Addresses some major gaps in the corpus, primarily intra-sentential relations ~ 13K new relations Modifications and extensions to guidelines to make them more reliable and comprehensive Application of revised guidelines to PDTB2 Merging of PDTB-2 and new relations PDTB-3 (~53K relations)

  9. Outline Introduction Discourse Coherence and its annotation PDTB Basics PDTB enrichment Motivation New relations Sense revisions Guidelines revisions Mapping to ISO-DR-Core Conclusion Separate effort for (really) full-text annotation

  10. New Relations Limited scope of PDTB-2 because of guidelines: Guidelines limited annotation to explicit relations lexicalized by discourse connectives, and implicit relations between paragraph-internal adjacent sentences and between (semi-) colon separated clauses within sentences. Discourse connectives were drawn from the pre-defined syntactic classes subordinating conjunctions, coordinating conjunctions, and discourse adverbials. Strict constraints on realization of arguments: With a few exceptions, arguments had to be realized as one or more clauses or sentences.

  11. New Relations But the constraints precluded many types of intra-sentential relations Precluded subordinate clauses that can occur without lexical subordinators while bearing an implicit relation to their matrix clause. Explicit relations lexicalized by discourse connectives, and implicit relations between paragraph- internal adjacent sentences and between (semi-) colon separated clauses within sentences. Free adjuncts Treasurys opened lower, Implicit=as a result of reacting negatively to news that the producer price index a measure of inflation on the wholesale level accelerated in September. (CONTINGENCY.CAUSE.REASON) Discourse connectives were drawn from the pre-defined syntactic classes Free to-infinitives Banks need a competitive edge Implicit=if (they are) to sell their products. (CONTINGENCY.CONDITION.ARG2-AS-CONDITION) Arguments had to be realized as one or more clauses or sentences.

  12. New Relations But the constraints precluded many types of intra-sentential relations Precluded relations triggered by prepositional subordinators like for, by, in, with, instead of, etc., that can complementize for clauses. Explicit relations lexicalized by discourse connectives, and implicit relations between paragraph- internal adjacent sentences and between (semi-) colon separated clauses within sentences. But with foreign companies snapping up U.S. movie studios, the networks are pressing their fight harder than ever. (CONTINGENCY.CAUSE.REASON) Discourse connectives were drawn from the pre-defined syntactic classes But on reflection, Mr. Oka says, he concluded that Nissan is being prudent in following its slow-startup strategy instead of simply copying Lexus. (EXPANSION.SUBSTITUTION.ARG1-AS-SUBST) Arguments had to be realized as one or more clauses or sentences.

  13. New Relations But the constraints precluded many types of intra-sentential relations Precluded relations between conjoined verb phrases (Webber et al., 2016). Explicit relations lexicalized by discourse connectives, and implicit relations between paragraph- internal adjacent sentences and between (semi-) colon separated clauses within sentences. Exceptions allowed VPs to be arguments of connectives She became an abortionist accidentally, and continued because it enabled her to buy jam, cocoa and other war rationed goodies. (CONTINGENCY.CAUSE.REASON) Discourse connectives were drawn from the pre-defined syntactic classes but not of the VP conjunction itself. She became an abortionist accidentally, and continued because it enabled her to buy jam, cocoa and other war rationed goodies. (EXPANSION.CONJUNCTION) Arguments had to be realized as one or more clauses or sentences. Stocks closed higher in Hong Kong, Manila, Singapore, Sydney and Wellington, but were lower in Seoul. (COMPARISON.CONTRAST)

  14. New Relations and Linking Webber et al. (2016): Arguments of certain explicit relations, particularly CONJUNCTIONS, can also be related by an additional implicit relation: She became an abortionist accidentally, and continued because it enabled her to buy jam, cocoa and other war rationed goodies. (EXPANSION.CONJUNCTION) She became an abortionist accidentally, Implicit=then and continued because it enabled her to buy jam, cocoa and other war-rationed goodies. (TEMPORAL.ASYNCHRONOUS.PRECEDENCE) In PDTB-3: Multiple relations holding between the same two arguments are LINKED in the underlying representation. Linking can involve mutliple explicit relations, multiple implicit relations, or an explicit and implicit relation.

  15. New PDTB-3 Relations: Distribution VP conjunctions account for about half of the total, but about 20% of these are implicit relations inferred in addition to the explicit conjunction S Conjunction Implicits: A consequence of our finding that additional implicit inferences can be associated with intra-sentential S conjunctions already annotated in PDTB-2 For PDTB-3, all S conjunction relations in PDTB-2 were revisited and reconsidered for these additional inferences 32% of the discourse relations associated with S conjunctions are additional implicit inferences

  16. Outline Introduction Discourse Coherence and its annotation PDTB Basics PDTB enrichment Motivation New relations Sense revisions Guidelines revisions Mapping to ISO-DR-Core Conclusion Separate effort for (really) full-text annotation

  17. PDTB2 Sense Hierarchy

  18. PDTB3 Sense Hierarchy Simplifications: senses at Level-3 now only encode directionality of the arguments, and so only appear with asymmetric Level-2 senses

  19. PDTB3 Sense Hierarchy Annotating intra-sentential discourse relations revealed asymmetric Level-2 senses for which the relation s arguments occur in either order (rather than the single order assumed in the PDTB-2).

  20. PDTB3 Sense Hierarchy Simplifications: Level-2 pragmatic senses have been removed from the hierarchy and replaced with features that can be attached to a relation token to indicate an inference of implicit belief or of a speech act associated with arguments.

  21. PDTB3 Sense Hierarchy Augmentations: New senses have been introduced on an as needed basis

  22. Hypophora as a New Relation Type There are many pairs in the corpus where the first sentence (Arg1) expresses a question seeking some information, and the second (Arg2) provides a response to fulfil that need. These relations cannot be instantiated with connectives, explicitly or implicitly. If not now, when? When the fruit is ripe, it falls from the tree by itself, he says. Of all the ethnic tensions in America, which is the most troublesome right now? A good bet would be the tension between blacks and Jews in New York City.

  23. Hypophora as new Relation Type So can a magazine survive by downright thumbing its nose at major advertisers? Garbage magazine, billed as The Practical Journal for the Environment, is about to find out. The response to the question can answer the information implicitly With all this, can stock prices hold their own? The question is unanswerable at this point she says. And the answer can also indicate that the information need cannot be fulfilled In PDTB-3, these QA pairs are marked as a NEW relation type, called HYPOPHORA, because these relations involve dialogue acts (Bunt et al., 2017), which are treated as distinct from discourse relations in PDTB, and because they are uninstantiable as connectives HYPOPHORA does not apply when the subsequent text relates to a question in other ways for example, with rhetorical questions that are posed for dramatic effect or to make an assertion, rather than to elicit an answer, or if the subsequent text provides an explanation for why the question has been asked What s wrong with asking for more money? Implicit=because Money is not everything, but it is necessary, and business is not volunteer work. (CONTINGENCY.CAUSE.REASON+BELIEF) What sector is stepping forward to pick up the slack? he asked. Implicit=because I draw a blank. (CONTINGENCY.CAUSE.REASON+SPEECH-ACT)

  24. Outline Introduction PDTB enrichment Motivation New relations Sense revisions Guidelines revisions Changes to argument labeling convention Extensions to AltLex Identification Mapping to ISO-DR-Core Conclusion Separate effort for (really) full-text annotation

  25. PDTB-2 Syntax-based Argument Labeling Convention Reference to realization type, syntactic attachment and linear order: Explicit: Arg2 was the argument to which the connective was attached syntactically; the other argument was Arg1. Implicit: Arg1 was always the first (lefthand) span; Arg2, the adjacent (righthand) span. Abstraction over variation in argument order, in combination with sense semantics, to provide consistency in relation semantics across all variants: Subordinating Conjunctions: John ate the fish even though he is a vegetarian. Even though John is a vegetarian, he ate the fish. Denying span of Concession is the same across variants CONCESSION.CONTRA-EXPECTATION (Arg1 denies) Coordinating conjunctions: John is a vegetarian but he ate the fish. Discourse adverbials: John is a vegetarian. Nevertheless, he ate the fish. Implicit Relations (Impl conn, AltLex, EntRel, NoRel): John is a vegetarian. Despite that, he ate the fish. CONCESSION.EXPECTATION (Arg2 denies)

  26. Inconsistencies with PDTB-2 Arg Labeling 1. Variability in where an explicit connective can attach within a sentence Japan not only outstrips the U.S. in investment flows but also outranks it in trade with most Southeast Asian countries The hacker was pawing over the Berkeley files but also using Berkeley and other easily accessible computers as stepping stones Not only did Mr. Ortega s comments come in the midst of what was intended as a showcase for the region, it came as Nicaragua is under special international scrutiny 2. Ability of marked syntax to replace explicit connectives. Had the contest gone a full seven games, ABC could have reaped an extra $10 million in ad sales . . . they probably would have gotten away with it, had they not felt compelled to add Ms. Collins s signature tune, Amazing Grace,

  27. PDTB-3 Syntax-based Argument Labeling Convention More fine-grained reference to syntactic structure, regardless of realization type. Avoids inconsistencies, while not requiring any change to existing labels in PDTB-2. Arguments to inter-sentential discourse relations remain labeled by position: Arg1 is first (lefthand) argument and Arg2, the second (righthand) argument. Arguments of intra-sentential coordinating structures are also labeled by position: Arg1 is the first conjunct and Arg2, the second conjunct. With intra-sentential subordinating structures, Arg1 and Arg2 are determined syntactically. The subordinate structure is always labeled Arg2, and the structure to which it is subordinate is labeled Arg1.

  28. Outline Introduction PDTB enrichment Motivation New relations Sense revisions Guidelines revisions Changes to argument labeling convention Extensions to AltLex Identification Mapping to ISO-DR-Core Conclusion Separate effort for (really) full-text annotation

  29. Extensions to AltLex Identification AltLex: In the absence of an explicit connective, if annotators inferred a relation between the sentences but felt that the insertion of a implicit connective would be redundant, they were asked to identify the non-connective expression in Arg2 that they took as the source of the perceived redundancy as the AltLex. (1) Allowance to include material for the AltLex expression from both Arg1 and Arg2. Some of the proposals are so closethat non-financial issues such as timing may play a more important role. (CONTINGENCY.CAUSE.RESULT) Things have gone too far for the government to stop them now. (CONTINGENCY.CAUSE.RESULT)

  30. Extensions to AltLex Identification AltLex: In the absence of an explicit connective, if annotators inferred a relation between the sentences but felt that the insertion of a implicit connective would be redundant, they were asked to identify the non-connective expression in Arg2 that they took as the source of the perceived redundancy as the AltLex. (2) Allowance to represent the expression of discourse relations with syntactic constructions. Crude as they were, these early PCs triggered explosive product development in desktop models for the home and office. (COMPARISON.CONCESSION.ARG1-AS-DENIER) Predicate Inversion Had the contest gone a full seven games, ABC could have reaped an extra $10 million in ad sales on the seventh game alone, compared with the ad take it would have received for regular prime-time shows. (CONTINGENCY.CONDITION.ARG2-AS-CONDITION) AUX Inversion

  31. Outline Introduction PDTB enrichment Motivation New relations Sense revisions Guidelines revisions Changes to argument labeling convention Extensions to AltLex Identification Mapping to ISO-DR-Core Conclusion Separate effort for (really) full-text annotation

  32. Mapping to ISO-DR-Core ISO 247617-8 Effort to develop an international standard for the annotation of discourse relations. Provide clear and mutually consistent definitions of a set of core discourse relations (senses) ISO-DR-Core Provide mappings from ISO-DR-Core relations to relations in different frameworks, including the PDTB. (Bunt and Prasad, 2016)

  33. Mapping to ISO-DR-Core Is the modified PDTB sense hierarchy mappable to the ISO-DR-Core relations? New senses with 1:1 mapping PURPOSE NEGATIVE CONDITION SIMILARITY MANNER New senses that do not have a correlate ARG2-AS-NEGGOAL (under Level-2 PURPOSE) NEGATIVE RESULT (under Level-2 CAUSE) Like the negative counterpart of condition, ISO-DR-Core should be extended to include the negative counterpart for CAUSE and PURPOSE. However, it remains an open question whether these relations should be defined in a way that captures both argument directionalities. In PDTB, no evidence yet for the reverse directionality for these senses.

  34. Mapping to ISO-DR-Core We still have not covered the conceptual space for discourse relations. Desirable approach: characterize ontology by considering semantic possibilities, with a language-independent approach, and a corpus-independent approach

  35. Conclusion: Corpus Release and Consistency PDTB-3 expected to be distributed in Fall 2018, through Linguistic Data Consortium (http://www.ldc.upenn.edu) Corpus (LDC) Manual/guidelines and tools (LDC and PDTB website, http://www.seas.upenn.edu/~pdtb) Annotation Quality: Annotation Adjudication Additional consistency checking in PDTB-3 Merge of PDTB-3 and PDTB-2

  36. Full-Text Annotation of Discourse Relations A separate effort (Prasad et al. (2017)) from the PDTB-3: Annotation of cross-paragraph implicit relations that are not annotated in either PDTB-2 or PDTB-3. When merged with PDTB-3 Full-text annotation of discourse relations Done over 145 texts (Sections 01, 06, and 23 of the corpus) However, annotation guidelines developed for the cross-paragraph annotation led to some departures from PDTB-2/3 guidelines in ways not incorporated in PDTB-3 To be distributed to the community separately, via github (https://github.com/pdtb-upenn/full-text).

  37. Thank You. Questions?

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#