
Semantic Annotation & Computational Pragmatics Overview
Semantic annotation involves adding semantic information to raw data, such as text, speech, or video, to constrain the possible interpretations. This process helps tackle ambiguity and vagueness in natural language. Standards like ISO 24617 aim to provide a solid theoretical foundation for semantically annotated corpora, ensuring domain-independent interoperability. The ISO 24617 Semantic Annotation Framework covers various aspects such as time, events, dialogue acts, semantic roles, principles of annotation, and more. Requirements for annotation standards include using standoff annotation for data integrity and distinguishing between annotation and representation at the conceptual and representation levels.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Semantic Annotation & Computational Pragmatics Harry Bunt Tilburg University Lecture VU Brussel March 25, 2022
Outline 1 Semantic annotation and ISO standards a. b. c. Computational pragmatics dialogue acts Annotation theory Principles of semantic annotation Dialogue annotation: ISO 24617-2 Plug-ins and annotation combination Conclusions and perspectives Temporal information Semantic roles Discourse relations 2 3 4 5
Semantic annotation is the addition of semantic information to raw (language) data: text, speech, video, multimodal data. Semantic annotations form constraints on the possible interpretation of the data. Helps to deal with ambiguity and vagueness, which are ubiquitous in natural language. Example: Sears was more than five percent down on Monday. I m seeing Annie on Monday. I always teach on Monday.
Semantic annotation standards Semantically annotated corpora tend to have ad hoc corpus-specific ways of analysing and annotating the data lacking a solid theoretical underpinning lack inter-operability across approaches, platforms, and domains. ISO (International Organisation for Standardization): Develop concepts and methods to support the creation and maintenance of annotated corpus data in a way that is theoretically and empirically justified domain-independent interoperable
ISO 24617 Semantic Annotation Framework Part 1: Time and events ( ISO-TimeML , Pustejovsky, 2012, based on TimeML (Pustejovsky, 2003)) Part 2: Dialogue acts (Bunt, 2012, based on DIT++ (Bunt, 2007)) Part 4: Semantic roles (Palmer & Bunt 2014) Part 6: Principles of semantic annotation (Bunt, 2016) Part 7: Spatial information (Pustejovsky & Lee, 2015) Part 8: Semantic relations in discourse (Prasad & Bunt, 2016) Part 9: Coreference (Romary, 2019) Part 11 Measurable Quantitative Information (Hao, Wang & Lee, 2020) Part 12: Quantification (Bunt, restarted February 2022) Part 14: Spatial semantics (Pustejovsky, Lee & Bunt, 2022)
. Requirements on annotation standards Data integrity use standoff annotation (rather than inline), with markables to identify stretches of primary data Distinguish annotation from representation. Conceptual level: An annotation is certain linguistic information that is added to language data, independent of its representation. Representation level: A representation is the format into which annotation is rendered, independent of its content. Semantic adequacy semantic annotations have an explicit semantics, which makes them machine- interpretable ISO Principles of semantic annotation (ISO 24617-4)
ISO-TimeML (24617-1) Annotation of time and events Example: John drove to Boston on Friday m1 m2 <event xml:id= e1 target= #m1 pred= drive /> <timex3 xml:id= t1 target= #m2 pred= friday type= date value= 2022-03-18 /> <tLink eventID= #e1 timex3ID= #t1 relType= during /> Semantics (in first-order logic, FOL): Ee Ex Et drive(x) friday(t) date(t, 2022-03-18) during(e,t)
Semantic Roles answer the question: Who did what to whom? (and how, and for whom, and what reason, and for what purpose, and with what result ) E.g.: John sent the book to Boulder by UPC Agent Theme Goal Instrument
ISO 24617-4, Semantic Roles ISO standard: - definition of 28 semantic roles - specification of use in XML annotations Example: John drove to Boston on Friday m1 m2 m3 <event xml:id= e1 target= #m2 pred= drive ( )/> <entity xml:id= x1 target= #m1 pred= john /> <entity xml:id= x2 target= #m3 pred= boston /> <srLink eventID= #e1 participant= #x1 semRole= agent /> <srLink eventID= #e1 participant= #x2 semRole= final-loc />
ISO 24617-4, Semantic Roles Example: John drove to Boston on Friday m1 m2 m3 <event xml:id= e1 target= #m2 pred= drive /> <entity xml:id= x1 target= #m1 pred= john /> <entity xml:id= x2 target= #m3 pred= boston /> <srLink eventID= #e1 participant= #x1 semRole= agent /> <srLink eventID= #e1 participant= #x2 semRole= final-loc /> Semantics: In FOL: Ee Ex Ey drive(e) john(x) boston(y) agent(e,x) final-loc(e,y) As DRS: [ e, x, y | drive(e), john(x), boston(y),agent(e,x), final-loc(e,y) ]
ISO-TimeML (24617-1) + ISO 24617-4: John drove to Boston on Friday m2 ISO 24617-1 m1 John drove to Boston on Friday m1 m2 m3 ISO 24617-4
ISO-TimeML (24617-1) + ISO 24617-4 John drove to Boston on Friday m1 m2 m3 m4 ISO 24617-2 + ISO 24617-4 <event xml:id= e1 target= #m2 pred= drive /> <entity xml:id= x1 target= #m1 pred= john /> <entity xml:id= x2 target= #m3 pred= boston /> <timex3 xml:id= t1 target= #m2 pred= friday /> <srLink eventID= #e1 participant= #x1 semRole= agent /> <srLink eventID= #e1 participant= #x2 semRole= final-loc /> <tLink eventID= #e1 timex3ID= #t1 reltype= during /> Semantics (DRS): [ e,x,y,t | drive(e), john(x), boston(y), friday(t), agent(e,x), final-loc(e,y), during(e,t), date(t, 2022-03-18) ] Representation of sentence meaning!
Complications Not just proper names : Since a few weeks the students go to a demonstration every Thursday The protesters threw a desk and two chairs out of the windows. Noun phrases are quantifiers, which lots of complex semantic properties. ISO 24617-12: Annotation of quantification.
Complications Most sentences are syntactically and/or semantically more complex. Sentences seldom occur in isolation, but rather as parts of a text or spoken discourse. People do not write or speak sentences, but utterances, which have a function in a written or spoken text or dialogue. Example: And that s on Sunday too (from telephone dialogue with Amsterdam Schiphol Airport information service)
Complications Most sentences are syntactically and/or semantically more complex. Sentences seldom occur in isolation, but rather as parts of a text or spoken discourse. People do not write or speak sentences, but utterances, which have a function in a written or spoken text or dialogue. Example: C: Could you give me the departure times in the evening please? I: There are two evening flights, one at 7:30 and, ehm, one at 8:50 C: And that s on Sunday too I: And that s on Sunday too
Complications how to deal with them Most sentences are syntactically and/or semantically more complex. Develop concepts and tools for dealing with complex syntactic-semantic structures (such as quantifying and modifying expressions). Sentences seldom occur in isolation, but rather as components of a text or spoken discourse. Develop concepts, tools for dealing with relations in discourse (e.g. coreference, question-answer) . People do not write or speak or write sentences, but utterances, which have a function in a written or spoken text or dialogue. Develop tools, concepts for indicating communicative functions and rhetorical relations.
Discourse structure ISO24617-8 (2017; 2022): Core semantic relations in discourse Discourse relations: Relations between events, states, facts, conditions, ( situations , including negated events etc.) or beliefs , talked about in discourse (text or spoken dialogue).
Discourse relations Example: Sears is negotiating to refinance the Sears Tower since they were unable to find a buyer for the building. <drArg xml:id= e1 target= m1 type= event /> <drArg xml:id= e2 target= m3 type= event /> <dRel xml:id= r1 target= m2 rel= cause /> <drLink rel= #r1 reason= e2 result= e1 /> Semantics: Ee1 Ee2 Cause(e2, e1)
Meaning of Discourse Relations Semantic Cause vs. Pragmatic Cause relation: John bought the book because he liked it. John is not in; he mailed me that he s still sick. Is that safe? Because this is a really expensive camera you know.
Meaning of Discourse Relations Semantic Cause vs. Pragmatic Cause relation: John bought the book because he liked it. <drArg xml:id= e1 target= m1 type= event /> Is that safe? Because this is a really expensive camera you know. <drArg xml:id= e1 target= m1 type= dialogAct /> John is not in; he mailed me that he s still sick.
Discourse relations in dialogue Ex. 1: 1. A: Where would you position the buttons? 2. A: I think that has an impact on many things. Ex. 2: A: I can never find my remote control. B: That s because they don t have a fixed place.
ISO 24617-8 Core Discourse Relations Cause Condition Negative Condition Concession Contrast Conjunction Disjunction Elaboration Exception Exemplification Expansion Manner Purpose Restatement Similarity Substitution Synchrony Asynchrony
Computational Pragmatics: Dialogue acts (DIT)
Dialogue acts A dialogue act is a unit of communicative behaviour, produced by a sender and directed at one or more addressees, that has a communicative function. Most types of dialogue acts additionally have a semantic content. (Exceptions are e.g greetings.) Dialogue acts may additionally have semantic dependence relations to other dialogue acts, like answer - question, confirmation - verification, decline request request
Dialogue acts DIT: Dialogue acts are viewed semantically as information-state updateoperators (or context update operators) Communicative function and semanticcontent ; the communicative function specifies a type of update operation; the semantic content is the material for updating an addressee s information state with.
Multidimensionality Participating in a dialogue involves more than just pursuing a certain goal, task or activity: - Giving and eliciting feedback - Taking turns - Managing the use of time - Establishing and maintaining contact - Dealing with social obligations (greeting, thanking, apologizing, ) - .. Communication has many dimensions Utterances in dialogue often may more than one communicative function they are multifunctional An utterance may have a function in more than one dimension
Dialogue act theory (DIT) View of communication-as-action inspired by speech act theory Speech act theory: every utterance (i.e., stretch of speech by one speaker) encodes one speech act DIT: utterances (i.e., stretches of communicative behaviour by one sender) encode multiple dialogue acts, are multifunctional: Utterances may have parts, called functional segments , which are the units that encode DAs Functional segments may encode multiple DAs Functional segments may be discontinuous, overlapping, stretch over multiple turns DA semantics: Update operation on addressee s information state (Dynamic Interpretation Theory, DIT, a.k.a. Information State Update (ISU) semantics).
Multifunctionality Example: A: Ehm, okay that's fine with me. Stalling Feedback Inform Take Turn
Other forms of multifunctionality Discontinuous multifunctionality: U: Can you tell me what time is the first train to the airport on Sunday? S: The first train to the airport on Sunday is at ... uhm, let me see... 5.32. U: Thank you. Overlapping multifunctionality: U: Can you tell me what time is the first train to the airport on Sunday? S: The first train to the airport on Sunday is at ... uhm, let me see... 5.32. U: Thank you. Interleaved multifunctionality: I think 25 euros for a remote -- is that locally something like 10 pounds is too much money to buy an extra or a replacement one -- or is it even more?
. Requirements on annotation standards Data integrity use standoff rather than inline annotation, with markables pointing to stretches of primary data to which an annotation element applies Distinguish annotation from representation. Conceptual level: An annotation is certain linguistic information that is added to language data, independent of its representation. Representation level: A representation is the format into which annotation is rendered, independent of its content. Semantic adequacy semantic annotations have an explicit semantics, which makes them machine- interpretable ISO Principles of semantic annotation (ISO 24617-6)
Annotation Scheme Architecture 3-part definition: La = <ASa, CSa, Sema>: Abstract syntax defines well-formed annotationstructures Concrete syntax specifies representations of annotation structures Semantics assigns meanings to annotation structures. ASa= <CIa, ACa>: conceptual inventory CIa; specification of structures ACa CSa = <VCa, CCa>: vocabulary VCa; specification of representatiion structures CCa Sema= <Ma, Ia>: model structure Ma; interpretation function Ia
Annotation Theory A concrete syntax is ideal for a given abstractsyntax if it has the properties: Completeness: For every annotation structure defined by the abstract syntax it defines a representation. (Total encoding function.) Unambiguity: Every representation is the rendering of exactly one annotation structure. (Decoding function, inverse of encoding.)
Annotation Theory F1 ideal concrete syntax-1 abstract syntax -1 F1 C21 -1 F2 C12 F2 ideal concrete syntax-2 Ia semantics
Annotation structures 2 Basic types of structure: Entity structures: semantic information about a segment of primary data. These are pairs <m,s> with m: markable, s: semantic information Link structures: semantic information relating annotated segments of primary data, e.g. triples <e1, e2, R>
DR-Core annotation and representation Example: Bill fell. Carl pushed him. m1 m2 Abstract annotation structure: {<m1,e1>, <m2,e2>, <<m1,e1>, <m2,e2>, causerel>} Concrete representation: <dRel id= r1 rel= cause /> <drArg id= e1 target= #m1 argType= event /> <drArg id= e2 target= #m2 argType= event /> <drLink rel= #r1 reason= #e2 result= #e1 /> Semantic interpretation: [ e1, e2 | Cause(e1, e2) ]
The ISO 24617-2 standard for dialogue act annotation
Dialogue act analysis frameworks Speech Act Theory Communication as Cooperation (Grice) (Austin, Searle) Communicative Activity Analysis (Allwood) HCRC TRAINS MRDA GBG-IM DAMSL + der. MATE DIT++ (2005) LIRICS DIT Verbmobil-2 Second Edition, 2020 ISO 24617-2 (2012) DIT++ (Release 5, 2010)
ISO 24617-2 dialogue annotation standard Comprehensive, domain-independent taxonomy of dialogue acts, based on DIT++ taxonomy (Bunt, 2005) Dialogue acts defined semantically as update operators applied to participants information states Utterances may be multifunctional, due to multiplicity of tasks in communicating Dialogue annotation is multidimensional, assigning multiple dialogue acts to segments of dialogue in multiple dimensions Taxonomy organized according to orthogonal DIT++ dimensions of communication
ISO/DIT++ dimensions Task: dialogue acts moving the underlying task/activity forward Auto-Feedback: providing information about speaker's processing of previous utterances Allo-Feedback: providing or eliciting information about addressee's processing of previous utterances Turn Management: allocation of speaker role Time Management: managing use of time Contact Management: managing presence & attention Discourse Structuring: explicitly structuring the dialogue Own Communication Management: editing one's own speech Partner Communication Man: editing addressee's speech Social Obligations Management: dealing with social conventions (greeting, thanking, apologizing,..) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Communicative functions General-purpose functions (21), applicable in any dimension, e.g.: Information-seeking functions: Propositional Question, Set Question, Check Question, Choice Question Information-providing functions: Inform, Confirm, Agreement, Disagreement, Correction, Warning Commissive functions: Promise, Offer, Accept Suggestion, Decline Suggestion,... Directive functions: Request, Instruct, Suggestion, Accept Offer, Decline Offer,.. Dimension-specific communicative functions (30), e.g.: Turn Take, Turn Accept,Turn Release (Turn Management) Stalling, Pausing (Time Management) Self-Correction (Own Communication Management) Completion (Partner Communication Management)
ISO 24617-2 and DIT++ General-Purpose Communicative Functions Information-transfer functions Action-discussion functions Info-seeking (8) info-providing (9) commissives (8) directives (7) Question Inform Address Suggest Request Offer Suggest Promise Propos l Choice Set Q Agreement Dis- Address Accept Decline Question Question Answer agreement Request Suggest Suggest Check Accept Decline Question Confirm Disconfirm Correction Request Request Instruct Address Offer Accept Decline Offer Offer
Dimension-specific communicative functions in ISO 24617-2 Auto- Allo- Time Partner Turn Own Discourse Social Feedback Feedback M CM M CM Structuring Obligations Managem. Positive Positive Stalling Completion Error signalling Opening Negative Negative Pausing Correct- Self-correction Pre-closing Elicitation misspeaking Turn-initial Turn-final Turn Take Turn Keep Thanking Turn Grab Turn Release Accept-Thanking Turn Accept Turn Assign I-Goodbye I-Greeting R-Greeting I-Self-Introduction R-Self-Introduction I-Apology Accept-Apology R-Goodbye
ISO Communicative functions 51 communicative functions - 21 general-purpose functions: 4 information-seeking functions 6 information-providing functions 6 commissive functions 5 directive functions - 30 core dimension-specific functions 2 auto-feedback functions 3 allo-feedback functions 6 turn management functions 2 time management functions 3 discourse structuring functions 2 own communication management functions 2 partner communication management functions 10 social obligation management functions
Other features of ISO 24617-2 Qualifiers, e.g. for sentiment and certainty, for making fine- grained distinctions. Functional dependence relations (e.g. Answer Question, Confirmation Check Question) between dialogue acts Feedback dependence relations between a feedback act and its antecedent dialogue act Rhetorical relations between dialogue acts or their semantic contents Annotation language DiAML (Dialogue Act Markup Language) with Abstract syntax (annotation structures as pairs, triples, ) Concrete syntax defining XML-representations Semantics of annotation structures as information-state update operators
DiAML abstract syntax Dialogue act theory: . A dialogue act is an octet <S, A, H, D, F, E, Q, c>: A sender S A set of addressees A A set of other participants H A dimension D A communicative function F A set of dependence relations A set of qualifiers Q A semantic content c ISO 24617-2 annotates these components except the semantic content. DiAML semantics interprets 7-tuples <S, A, H, D, F, E, Q> as functions that, applied to a semantic content, define information state updates.
DiAML Semantics Information state update semantics: Ida(S, A, H, D, f, E, Q, c) = Ida(S, A, H, D, f, E, Q) (Ida(c)), which is an update operation on information states
DiAML concrete syntax P1: What time is the first train on Sunday to the Airport? P2: The first train on Sunday is at 6.15, I believe. <diaml xmlns:"http://www.iso.org/diaml/"> <dialogueAct xml:id="da1" target= #fs1 sender="#p1" addressee="#p2 communicativeFunction="setQuestion" dimension="task /> <dialogueAct xml:id="da2" target="#fs2.1 sender="#p2" addressee="#p1 communicativeFunction="autoPositive dimension="autoFeedback feedbackDependence="#fs1"/> <dialogueAct xml:id="da3" target="#fs2 sender="#p2" addressee="#p1 communicativeFunction="answer dimension="task" certainty="uncertain functionalDependence="#da1"/> </diaml>
The DialogBank Language resource built at Tilburg University, currently being transferred to Saarland University: https://dialogbank.lsv.uni-saarland.de/ Annotated dialogues: Using ISO 24617-2 Gold standard Re-annotated dialogues from existing corpora Some with original annotations Some with annotations of previous DIT++ versions Newly annotated dialogues from existing corpora without annotation From newly collected corpora H. Bunt, V.Petukhova, A.Malchanau, A.Fang & K. Wijnhoven 'The DialogBank: Dialogues with Interoperable Annotations.' Language Resources and Evaluation 53:213-249, 2019.
The DialogBank Origin Lang Original and previous representations NITE XML DiAML-Anvil EN 3-column tabular EN DiAML-Anvil EN DiAML-XML NL 13-column tabular Original annotation HCRC Map Task communicative functions; DIT++4.0 SWBD-DAMSL communicative functions comm. functions DAMSL communicative functions DIT4.0 full ISO 24617-2 annotation DIT++3.0 communicative functions and dimensions no dialogue act annotation DIT++ communicative functions and dimensions DIT++ communicative functions and dimensions Previous annotation HCRC Map Task EN Switchboard TRAINS DBOX DIAMOND Dutch Map Task NL plain text OVIS NL Schiphol Airport NL plain text ISO 24617-2 annotation DIT$++3.0 no previous act annotation DIT++3.0 no dialogue plain text DIT++3.0