Unsupervised Learning of Probabilistic Grammars

undefined
On the Utility of Curricula in
Unsupervised Learning of Probabilistic Grammars
Kewei Tu and Vasant Honavar
Artificial Intelligence Research Laboratory
Department of Computer Science
Iowa State University
Outline
Unsupervised Grammar Learning
Grammar Learning with a Curriculum
The Incremental Construction Hypothesis
Theoretical Analysis
Empirical Support
2
Probabilistic Grammars
A probabilistic grammar is 
a set of probabilistic
production rules
 that define a joint probability of a
grammatical structure and its sentence
 
Example from [Jurafsky & Martin, 2006]
 
P = 2.2 × 10
-
6
 
……
3
Probabilistic Grammars
Probabilistic grammars are widely used in
Natural language parsing
Bioinformatics, e.g., RNA structure modeling
Pattern recognition
Specifying grammars is hard
Machine learning offers a practical alternative
4
Learning a grammar from a corpus
 
 
 
 
 
 
 
Supervised Methods
Rely on a training corpus of sentences annotated with
grammatical structures (parses)
Unsupervised Methods
Do not require annotated data
A square is above the
triangle.
A triangle rolls.
The square rolls.
A triangle is above the
square.
A circle touches a square.
……
S  
 NP VP
NP 
 Det N
VP 
 Vt NP (0.3)
     
| Vi PP (0.2)
     
| rolls (0.2)
     
| bounces(0.1)
……
Training Corpus
 
Probabilistic Grammar
 
Induction
5
Current Approaches
Process the entire corpus to learn the grammar
No, it wasn't Black Monday.  But
while the New York Stock
Exchange didn't fall apart Friday
as the Dow Jones Industrial
Average plunged 190.58 points --
most of it in the final hour -- it
barely managed to stay this side
of chaos. Some “circuit
breakers”' installed after the
October 1987 crash failed their
first test, traders say, unable to
cool the selling panic…
 
Image from www.editorsweblog.org
 
Image from www.christart.com
6
Grammar Learning with a Curriculum
 
Start with the 
simplest
 sentences
Progress to 
increasingly more complex
 sentences
Good.
Come here.
……
The rabbit is behind the tree.
Alice is sitting on the
riverbank.
……
Alice: I wonder if I've been
changed in the night? Let me think.
Was I the same when I got up this
morning? I almost think I can
remember feeling a little
different…
 
Image from www.ibirthdayclipart.com
7
Curriculum Learning 
[Bengio et al., 2009]
 
A curriculum is a sequence of weighting schemes of the
training data:
     
 assigns more weight to “easier” training samples
Each subsequent weighting scheme assigns more weight to
“harder” samples
      
 assigns uniform weight to each sample
Learning is iterative
In each iteration, the learner is
initialized with the model learned during the previous
iteration
trained from the data weighted by the current weighting
scheme
8
Experiments
 
Learning a probabilistic dependency grammar from the
Wall Street Journal corpus of the Penn Treebank
Base learning algorithm
Expectation-maximization
Sentence complexity measure
Sentence length
Sentence likelihood given the learned grammar
Weight Assignment
0 or 1
A continuous function
9
Experimental Results
10
All of the four curricula help learning.
Questions
Under what conditions does a curriculum help in
unsupervised learning of probabilistic grammars?
How can we design good curricula?
How can we design algorithms that can take advantage of
the curricula?
11
The Incremental Construction Hypothesis
 
An ideal curriculum gradually emphasizes data samples
that help the learner to 
successively discover new
substructures
 (i.e., grammar rules) of the target grammar,
which facilitates the learning.
 
We say a curriculum                              satisfies
incremental construction if:
For any 
     
, the weighted training data correspond to a
sentence distribution defined by a probabilistic grammar
For any 
        
, 
     
 is a sub-grammar of
(See Section 3 of the paper for the more precise definitions)
12
Theoretical Analysis
Theorem:
 
If a curriculum satisfies incremental construction,
then for any          s.t. 
                             
, we have
 
where      is the      distance between the grammar rule
probabilities;          is the total variation distance between the
distributions of grammatical structures defined by the two
grammars.
13
undefined
G
0
G
n
 
Without a curriculum
 
With a curriculum
 
Intermediate grammars
14
Guidelines for Curriculum Design
A good curriculum should:
(approximately) satisfy incremental construction
effectively break down the target grammar into as many
chunks as possible
at each stage, introduce the new rule(s) that results in the
largest number of new sentences
if r
1
 is required for r
2
 to be used, then r
1
 shall be
introduced earlier than r
2
among rules with the same LFS, rules with larger
probabilities shall be introduced first
15
Guideline for Algorithm Design
Observation
the learning target at each stage of a curriculum is a partial
grammar
Guideline
avoid the over-fitting to this partial grammar that hinders
the acquisition of new grammar rules in later stages
16
Experiments on Synthetic Data
Data generated from the Treebank grammar of WSJ30
Curricula constructed based on the target grammar
Ideal
: 
Satisfies all the guidelines
Sub-Ideal
: 
Doesn’t satisfy the 3
rd
 guideline: randomly
choosing new grammar rules at each stage
Random
: 
Doesn’t satisfy any guideline: randomly choosing
new sentences at each stage
Ideal-10
, 
Sub-Ideal-10
, 
Random-10
: 
Introduce at
least 10 new sentences at each stage, hence containing fewer
stages
Length-based
: 
Introduces new sentences based on their
lengths
17
Experiments on Synthetic Data
18
Length-based Curriculum
Very similar to the ideal curricula in this case (measured
by rank correlation)
19
Analysis on Real Data
 
Ideal curricula cannot be constructed in unsupervised
learning from real data
We find evidence that the length-based curriculum can be
seen as a proxy for an ideal curriculum on real data
20
Evidence from WSJ30
The introduction of grammar rules is spread throughout
the entire curriculum
More frequently used rules are introduced earlier
21
Evidence from WSJ30
Grammar rules introduced in earlier stages are always
used in sentences introduced in later stages
22
Evidence from WSJ30
In the sequence of intermediate grammars, most rule
probabilities first increase and then decrease, which
satisfies a relaxed definition of ideal curricula that satisfy
incremental construction
23
Conclusion
We have introduced the 
incremental construction hypothesis
an explanation of the benefits of curricula in unsupervised
learning of probabilistic grammars.
a source of guidelines for designing curricula as well as
unsupervised grammar learning algorithms
The hypothesis is supported by both 
theoretical analysis
and 
experimental results 
(on both synthetic and real data)
24
undefined
Thank You!
Q&A
undefined
Backup
 
undefined
l
r 
: the length of the shortest sentence in the set of sentences
that use rule 
r
27
undefined
Mean and std of the lengths of the sentences that use each rule
28
undefined
 
The change of probabilities of VBD headed rules with the stages
of the length-based curriculum in the treebank grammar.
29
Slide Note
Embed
Share

In the Artificial Intelligence Research Laboratory at Iowa State University, researchers study the utility of curricula in unsupervised learning of probabilistic grammars. They explore grammar learning with a curriculum, the incremental construction hypothesis, and provide theoretical analysis supported by empirical evidence. Probabilistic grammars, used widely in natural language parsing and bioinformatics, pose challenges in specifying grammars, making machine learning a practical alternative. The research delves into training corpus, probabilistic grammar induction, supervised versus unsupervised methods, and current approaches in processing entire text corpora for grammar learning.

  • Artificial Intelligence
  • Probabilistic Grammars
  • Unsupervised Learning
  • Grammar Induction
  • Machine Learning.

Uploaded on Feb 21, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Artificial Intelligence Research Laboratory Department of Computer Science On the Utility of Curricula in Unsupervised Learning of Probabilistic Grammars Kewei Tu and Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science Iowa State University

  2. Artificial Intelligence Research Laboratory Department of Computer Science Outline Unsupervised Grammar Learning Grammar Learning with a Curriculum The Incremental Construction Hypothesis Theoretical Analysis Empirical Support 2

  3. Artificial Intelligence Research Laboratory Department of Computer Science Probabilistic Grammars A probabilistic grammar is a set of probabilistic production rules that define a joint probability of a grammatical structure and its sentence P = 2.2 10- 6 Example from [Jurafsky & Martin, 2006] 3

  4. Artificial Intelligence Research Laboratory Department of Computer Science Probabilistic Grammars Probabilistic grammars are widely used in Natural language parsing Bioinformatics, e.g., RNA structure modeling Pattern recognition Specifying grammars is hard Machine learning offers a practical alternative 4

  5. Artificial Intelligence Research Laboratory Department of Computer Science Learning a grammar from a corpus Training Corpus Probabilistic Grammar Induction S NP VP NP Det N VP Vt NP (0.3) | Vi PP (0.2) | rolls (0.2) | bounces(0.1) A square is above the triangle. A triangle rolls. The square rolls. A triangle is above the square. A circle touches a square. Supervised Methods Rely on a training corpus of sentences annotated with grammatical structures (parses) Unsupervised Methods Do not require annotated data 5

  6. Artificial Intelligence Research Laboratory Department of Computer Science Current Approaches Process the entire corpus to learn the grammar No, it wasn't Black Monday. But while the New York Stock Exchange didn't fall apart Friday as the Dow Jones Industrial Average plunged 190.58 points -- most of it in the final hour -- it barely managed to stay this side of chaos. Some circuit breakers ' installed after the October 1987 crash failed their first test, traders say, unable to cool the selling panic Image from www.editorsweblog.org Image from www.christart.com 6

  7. Artificial Intelligence Research Laboratory Department of Computer Science Grammar Learning with a Curriculum Good. Come here. The rabbit is behind the tree. Alice is sitting on the riverbank. Alice: I wonder if I've been changed in the night? Let me think. Was I the same when I got up this morning? I almost think I can remember feeling a little different Image from www.ibirthdayclipart.com Start with the simplest sentences Progress to increasingly more complex sentences 7

  8. Artificial Intelligence Research Laboratory Department of Computer Science Curriculum Learning [Bengio et al., 2009] A curriculum is a sequence of weighting schemes of the training data: assigns more weight to easier training samples Each subsequent weighting scheme assigns more weight to harder samples assigns uniform weight to each sample Learning is iterative In each iteration, the learner is initialized with the model learned during the previous iteration trained from the data weighted by the current weighting scheme 8

  9. Artificial Intelligence Research Laboratory Department of Computer Science Experiments Learning a probabilistic dependency grammar from the Wall Street Journal corpus of the Penn Treebank Base learning algorithm Expectation-maximization Sentence complexity measure Sentence length Sentence likelihood given the learned grammar Weight Assignment 0 or 1 A continuous function 9

  10. Artificial Intelligence Research Laboratory Department of Computer Science Experimental Results All of the four curricula help learning. 10

  11. Artificial Intelligence Research Laboratory Department of Computer Science Questions Under what conditions does a curriculum help in unsupervised learning of probabilistic grammars? How can we design good curricula? How can we design algorithms that can take advantage of the curricula? 11

  12. Artificial Intelligence Research Laboratory Department of Computer Science The Incremental Construction Hypothesis An ideal curriculum gradually emphasizes data samples that help the learner to successively discover new substructures (i.e., grammar rules) of the target grammar, which facilitates the learning. We say a curriculum incremental construction if: For any , the weighted training data correspond to a sentence distribution defined by a probabilistic grammar For any , is a sub-grammar of (See Section 3 of the paper for the more precise definitions) satisfies 12

  13. Artificial Intelligence Research Laboratory Department of Computer Science Theoretical Analysis Theorem:If a curriculum satisfies incremental construction, then for any s.t. , we have where probabilities; distributions of grammatical structures defined by the two grammars. is the distance between the grammar rule is the total variation distance between the 13

  14. Intermediate grammars With a curriculum G0 Gn Without a curriculum 14

  15. Artificial Intelligence Research Laboratory Department of Computer Science Guidelines for Curriculum Design A good curriculum should: (approximately) satisfy incremental construction effectively break down the target grammar into as many chunks as possible at each stage, introduce the new rule(s) that results in the largest number of new sentences if r1 is required for r2 to be used, then r1 shall be introduced earlier than r2 among rules with the same LFS, rules with larger probabilities shall be introduced first 15

  16. Artificial Intelligence Research Laboratory Department of Computer Science Guideline for Algorithm Design Observation the learning target at each stage of a curriculum is a partial grammar Guideline avoid the over-fitting to this partial grammar that hinders the acquisition of new grammar rules in later stages 16

  17. Artificial Intelligence Research Laboratory Department of Computer Science Experiments on Synthetic Data Data generated from the Treebank grammar of WSJ30 Curricula constructed based on the target grammar Ideal: Satisfies all the guidelines Sub-Ideal: Doesn t satisfy the 3rd guideline: randomly choosing new grammar rules at each stage Random: Doesn t satisfy any guideline: randomly choosing new sentences at each stage Ideal-10, Sub-Ideal-10, Random-10: Introduce at least 10 new sentences at each stage, hence containing fewer stages Length-based: Introduces new sentences based on their lengths 17

  18. Artificial Intelligence Research Laboratory Department of Computer Science Experiments on Synthetic Data 18

  19. Artificial Intelligence Research Laboratory Department of Computer Science Length-based Curriculum Very similar to the ideal curricula in this case (measured by rank correlation) 19

  20. Artificial Intelligence Research Laboratory Department of Computer Science Analysis on Real Data Ideal curricula cannot be constructed in unsupervised learning from real data We find evidence that the length-based curriculum can be seen as a proxy for an ideal curriculum on real data 20

  21. Artificial Intelligence Research Laboratory Department of Computer Science Evidence from WSJ30 The introduction of grammar rules is spread throughout the entire curriculum More frequently used rules are introduced earlier 21

  22. Artificial Intelligence Research Laboratory Department of Computer Science Evidence from WSJ30 Grammar rules introduced in earlier stages are always used in sentences introduced in later stages 22

  23. Artificial Intelligence Research Laboratory Department of Computer Science Evidence from WSJ30 In the sequence of intermediate grammars, most rule probabilities first increase and then decrease, which satisfies a relaxed definition of ideal curricula that satisfy incremental construction 23

  24. Artificial Intelligence Research Laboratory Department of Computer Science Conclusion We have introduced the incremental construction hypothesis an explanation of the benefits of curricula in unsupervised learning of probabilistic grammars. a source of guidelines for designing curricula as well as unsupervised grammar learning algorithms The hypothesis is supported by both theoretical analysis and experimental results (on both synthetic and real data) 24

  25. Artificial Intelligence Research Laboratory Department of Computer Science Thank You! Q&A

  26. Artificial Intelligence Research Laboratory Department of Computer Science Backup

  27. lr : the length of the shortest sentence in the set of sentences that use rule r 27

  28. Mean and std of the lengths of the sentences that use each rule 28

  29. The change of probabilities of VBD headed rules with the stages of the length-based curriculum in the treebank grammar. 29

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#