Ensembling Diverse Approaches to Question Answering

1
Ensembling Diverse Approaches
to Question Answering
Raymond J. Mooney
Dept. of Computer Science
University of Texas at Austin
Diverse Types of Q/A
Factoid querying of open-domain raw text.
Compositional querying of manually curated
structured database / knowledge-graph.
Compositional querying of knowledge
base/graph automatically constructed from
raw text.
Reading comprehension querying of specific
documents.
Visual question answering from images
and/or videos.
2
Open Domain Factoid Q/A
NIST TREC Q/A track initiated in 1998.
Which museum in Florence was damaged by
a major bomb explosion in 1993?”
Combine IR passage retrieval with limited
NLP processing (NER, SRL, Question
Classification).
3
Answer
String
Limitations of Factoid Q/A
Question must query a specific fact that is
explicitly stated somewhere in the
document corpus.
Does not allow aggregating or accumulating
information across multiple information
sources.
Does not require “deep compositional”
semantics, nor inferential reasoning to
generate answer.
4
Learning Semantic Parsers for KB Q/A
Semantic parsers can be automatically
learned from various forms of supervision:
Questions + logical forms
Questions + answers (latent logical form)
5
Recent Structured Query Datasets
Freebase queries (limited compositionality)
Free917 (Cai & Yates, 2013)
“How many works did Mozart dedicate to Joseph Haydn?”
WebQuestions (Berant et al., 2013)
“What music did Beethoven compose?”
Wikipedia table questions (more
compositionality, must generalize to new test
tables)
WikiTableQuestions (Pasupat & Liang, 2015)
“How many runners took 2 minutes at the most to run 1500
meters?”
6
Limitations of
Traditional Semantic Parsing
Requires fixed, predetermined semantic
ontology of types and relations.
Requires pre-assembled database or
knowledge base/graph.
Not easily generalized to open-domain Q/A
7
Q/A for AKBC
(Automated Knowledge Base Construction)
8
Issues with AKBC Q/A
KB may contain uncertainty and errors due
to imperfect extraction.
Can create ontology automatically using
Unsupervised Semantic Parsing 
(USP)
Use relational clustering of words and phrases
to automatically induce a “latent” set of
semantic predicates for types and relations from
dependency-parsed text 
(Poon & Domingos, 2008;
Titov & Klementiev, 2011; Lewis & Steedman, 2013)
.
9
Reading Comprehension Q/A
Answer questions that test comprehension
of a specific document.
Use standardized tests of reading
comprehension to evaluate performance
(Hirschman et al. 1999; Rilo & Thelen, 2000; Ng
et al. 2000; Charniak et al. 2000).
10
Sample Reading Comprehension Test
11
Large Scale
Reading Comprehension Data
DeepMind’s large-scale data for reading
comprehension Q/A 
(Hermann et al., 2015)
.
News articles used as source documents.
Questions constructed automatically from article
summary sentences.
12
Sample DeepMind Reading
Comprehension Test
13
Deep LSTM Reader
DeepMind uses LSTM recurrent neural net
(RNN) to encode document and query into a
vector that is then used to predict the answer.
14
Document
LSTM
Encoder
Question
Embedding
Answer
Extractor
Answer
 
Incorporated
 various forms of attention to
focus the reader on answering the question
while reading the document.
Visual Question Answering (VQA)
Answer natural language questions about
information in images.
VaTech/MSR group has put together VQA
dataset with ~750K questions over ~250K
images 
(Antol et al., 2016)
.
15
VQA Examples
16
LSTM System for VQA
17
Hybrid RNN/Semantic-Parsing for Q/A
Recent approach composes neural-network
for a specific question 
(Andreas et al. 2016)
.
Combines syntactic parse with lexically-
based component neural networks.
Test on multiple types of Q/A
Visual Question Answering
Geographical database queries
18
Compositional Neural Nets for Q/A
(Andreas et al., 2016)
19
Ensembling for Q/A
Ensembling multiple methods is a proven
approach to generating robust, general-
purpose systems.
Using supervised learning to train a meta-
classifier to optimally combine multiple
outputs, i.e. 
stacking 
(Wolpert, 1992)
, is
particularly effective.
20
Stacking in IBM Watson
Stacking was used to combine evidence for
each answer from many different
components in the IBM Watson Jeopardy
system 
(
Ferrucci et al. 2010) 
.
Meta-classifier was trained to correctly
answer years of prior Jeopardy questions.
21
22
System 1
System 2
 
System N-1
System N
SVM
MetaClassifier
Accept?
Slot Type
 
Provenanc
e
Features
conf 1
conf 2
conf N-1
conf N
For each proposed slot-fill, e.g. 
spouse(Barack, Michelle),
combine multiple system confidences and additional features
(Rajani & Mooney, ACL 2015)
Stacking with Auxiliary Features
for KBP Slot Filling
23
Stacking restricts us to ensembling systems for
which we have training data from previous years.
Use unsupervised ensembling to first combine
confidence scores for “new” systems.
 
Employ constrained optimization approach of
Weng et al. (2013) 
.
Then use stacking to combine result of the
resulting unsupervised ensemble with supervised
systems.
Ensembling Systems
With and Without Prior Performance Data
Combining Supervised and Unsupervised
Methods using Stacking
24
Sup System 1
Sup 
System 2
 Sup 
System N
Unsup 
System
 1
SVM
MetaClassifier
Accept?
Slot Type
 
Provenanc
e
Features
conf 1
conf 2
conf N
Unsup 
System 
2
Calibrated
conf
Unsup 
System
 M
Constrained Optimization (Weng et al., 2013)
25
New Auxiliary Provenance Features
Query Document Similarity Features
KBP SF queries come with a “query document” to
disambiguate the query entity.
For each system, compute cosine similarity between its
provenance document and the query document.
 
Provenance Document Similarity Features
For each system, compute the average cosine similarity
between its provenance document and the provenance
document of all other systems.
26
KBP 2015 Cold Start Slot Filling
Ensembling Track Results
Stacking for General Q/A
A promising approach to building a general
Q/A system is to stack a variety of the
approaches we have reviewed
Possibly using auxiliary provenance features.
27
Conclusions
There are many diverse types of question-
answering scenarios.
There are many diverse approaches to each of
these Q/A scenarios.
Ensembling these diverse approaches is a
promising approach to building a general
purpose Q/A system.
28
Slide Note
Embed
Share

Diverse types of question answering approaches include factoid querying, compositional querying of structured databases/knowledge graphs, reading comprehension, and visual question answering. Limitations of factoid question answering are also discussed, highlighting the need for specific queries and inability to aggregate information. The importance of learning semantic parsers for knowledge base question answering is emphasized, along with recent structured query datasets. Additionally, traditional semantic parsing limitations are outlined, emphasizing the need for predetermined semantic ontology and database structures. The field of automated knowledge base construction is introduced, emphasizing the integration of document corpus, information extraction systems, and knowledge base question answering.

  • Question Answering
  • Factoid Querying
  • Semantic Parsers
  • Knowledge Base
  • Structured Queries

Uploaded on Oct 04, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Ensembling Diverse Approaches to Question Answering Raymond J. Mooney Dept. of Computer Science University of Texas at Austin 1

  2. Diverse Types of Q/A Factoid querying of open-domain raw text. Compositional querying of manually curated structured database / knowledge-graph. Compositional querying of knowledge base/graph automatically constructed from raw text. Reading comprehension querying of specific documents. Visual question answering from images and/or videos. 2

  3. Open Domain Factoid Q/A Document corpus Answer String Q/A System Question NIST TREC Q/A track initiated in 1998. Which museum in Florence was damaged by a major bomb explosion in 1993? Combine IR passage retrieval with limited NLP processing (NER, SRL, Question Classification). 3

  4. Limitations of Factoid Q/A Question must query a specific fact that is explicitly stated somewhere in the document corpus. Does not allow aggregating or accumulating information across multiple information sources. Does not require deep compositional semantics, nor inferential reasoning to generate answer. 4

  5. Learning Semantic Parsers for KB Q/A Semantic parsers can be automatically learned from various forms of supervision: Questions + logical forms Questions + answers (latent logical form) 5

  6. Recent Structured Query Datasets Freebase queries (limited compositionality) Free917 (Cai & Yates, 2013) How many works did Mozart dedicate to Joseph Haydn? WebQuestions (Berant et al., 2013) What music did Beethoven compose? Wikipedia table questions (more compositionality, must generalize to new test tables) WikiTableQuestions (Pasupat & Liang, 2015) How many runners took 2 minutes at the most to run 1500 meters? 6

  7. Limitations of Traditional Semantic Parsing Requires fixed, predetermined semantic ontology of types and relations. Requires pre-assembled database or knowledge base/graph. Not easily generalized to open-domain Q/A 7

  8. Q/A for AKBC (Automated Knowledge Base Construction) Document corpus IE System KB Question Answer String Q/A System 8

  9. Issues with AKBC Q/A KB may contain uncertainty and errors due to imperfect extraction. Can create ontology automatically using Unsupervised Semantic Parsing (USP) Use relational clustering of words and phrases to automatically induce a latent set of semantic predicates for types and relations from dependency-parsed text (Poon & Domingos, 2008; Titov & Klementiev, 2011; Lewis & Steedman, 2013). 9

  10. Reading Comprehension Q/A Answer questions that test comprehension of a specific document. Use standardized tests of reading comprehension to evaluate performance (Hirschman et al. 1999; Rilo & Thelen, 2000; Ng et al. 2000; Charniak et al. 2000). 10

  11. Sample Reading Comprehension Test 11

  12. Large Scale Reading Comprehension Data DeepMind s large-scale data for reading comprehension Q/A (Hermann et al., 2015). News articles used as source documents. Questions constructed automatically from article summary sentences. 12

  13. Sample DeepMind Reading Comprehension Test 13

  14. Deep LSTM Reader DeepMind uses LSTM recurrent neural net (RNN) to encode document and query into a vector that is then used to predict the answer. Document LSTM Encoder Answer Extractor Embedding Answer Question Incorporated various forms of attention to focus the reader on answering the question while reading the document. 14

  15. Visual Question Answering (VQA) Answer natural language questions about information in images. VaTech/MSR group has put together VQA dataset with ~750K questions over ~250K images (Antol et al., 2016). 15

  16. VQA Examples 16

  17. LSTM System for VQA 17

  18. Hybrid RNN/Semantic-Parsing for Q/A Recent approach composes neural-network for a specific question (Andreas et al. 2016). Combines syntactic parse with lexically- based component neural networks. Test on multiple types of Q/A Visual Question Answering Geographical database queries 18

  19. Compositional Neural Nets for Q/A (Andreas et al., 2016) 19

  20. Ensembling for Q/A Ensembling multiple methods is a proven approach to generating robust, general- purpose systems. Using supervised learning to train a meta- classifier to optimally combine multiple outputs, i.e. stacking (Wolpert, 1992), is particularly effective. 20

  21. Stacking in IBM Watson Stacking was used to combine evidence for each answer from many different components in the IBM Watson Jeopardy system (Ferrucci et al. 2010) . Meta-classifier was trained to correctly answer years of prior Jeopardy questions. 21

  22. Stacking with Auxiliary Features for KBP Slot Filling For each proposed slot-fill, e.g. spouse(Barack, Michelle), combine multiple system confidences and additional features conf 1 System 1 Provenance Features Slot Type conf 2 System 2 SVM MetaClassifier System N-1 conf N-1 Accept? conf N System N (Rajani & Mooney, ACL 2015) 22

  23. Ensembling Systems With and Without Prior Performance Data Stacking restricts us to ensembling systems for which we have training data from previous years. Use unsupervised ensembling to first combine confidence scores for new systems. Employ constrained optimization approach of Weng et al. (2013) . Then use stacking to combine result of the resulting unsupervised ensemble with supervised systems. 23

  24. Combining Supervised and Unsupervised Methods using Stacking conf 1 Sup System 1 Provenance Features Slot Type conf 2 Sup System 2 Sup System N conf N SVM MetaClassifier Unsup System 1 Calibrated conf Unsup System 2 Accept? Unsup System M Constrained Optimization (Weng et al., 2013) 24

  25. New Auxiliary Provenance Features Query Document Similarity Features KBP SF queries come with a query document to disambiguate the query entity. For each system, compute cosine similarity between its provenance document and the query document. Provenance Document Similarity Features For each system, compute the average cosine similarity between its provenance document and the provenance document of all other systems. 25

  26. KBP 2015 Cold Start Slot Filling Ensembling Track Results Approach Precision Recall F1 Combined supervised and unsupervised with new features 0.4314 0.4489 0.4679 Combined supervised and unsupervised without new features 0.4789 0.3588 0.4103 Only supervised stacking approach (ACL 2015) 0.5084 0.2855 0.3657 Top ranked CSSF 2015 system 0.3989 0.3058 0.3462 Constrained optimization on all systems (Weng et al., 2013) 0.1712 0.3998 0.2397 26

  27. Stacking for General Q/A A promising approach to building a general Q/A system is to stack a variety of the approaches we have reviewed Possibly using auxiliary provenance features. 27

  28. Conclusions There are many diverse types of question- answering scenarios. There are many diverse approaches to each of these Q/A scenarios. Ensembling these diverse approaches is a promising approach to building a general purpose Q/A system. 28

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#