Unsupervised Learning of Probabilistic Grammars

undefined

On the Utility of Curricula in

Unsupervised Learning of Probabilistic Grammars

Kewei Tu and Vasant Honavar

Artificial Intelligence Research Laboratory

Department of Computer Science

Iowa State University

Outline



Unsupervised Grammar Learning



Grammar Learning with a Curriculum



The Incremental Construction Hypothesis



Theoretical Analysis



Empirical Support

Probabilistic Grammars



A probabilistic grammar is

a set of probabilistic

production rules

 that define a joint probability of a

grammatical structure and its sentence

Example from [Jurafsky & Martin, 2006]

P = 2.2 × 10

……

Probabilistic Grammars



Probabilistic grammars are widely used in



Natural language parsing



Bioinformatics, e.g., RNA structure modeling



Pattern recognition



Specifying grammars is hard



Machine learning offers a practical alternative

Learning a grammar from a corpus



Supervised Methods



Rely on a training corpus of sentences annotated with

grammatical structures (parses)



Unsupervised Methods



Do not require annotated data

A square is above the

triangle.

A triangle rolls.

The square rolls.

A triangle is above the

square.

A circle touches a square.

……



 NP VP

NP



 Det N

VP



 Vt NP (0.3)

| Vi PP (0.2)

| rolls (0.2)

| bounces(0.1)

……

Training Corpus

Probabilistic Grammar

Induction

Current Approaches



Process the entire corpus to learn the grammar

No, it wasn't Black Monday.  But

while the New York Stock

Exchange didn't fall apart Friday

as the Dow Jones Industrial

Average plunged 190.58 points --

most of it in the final hour -- it

barely managed to stay this side

of chaos. Some “circuit

breakers”' installed after the

October 1987 crash failed their

first test, traders say, unable to

cool the selling panic…

Image from www.editorsweblog.org

Image from www.christart.com

Grammar Learning with a Curriculum



Start with the

simplest

 sentences



Progress to

increasingly more complex

 sentences

Good.

Come here.

……

The rabbit is behind the tree.

Alice is sitting on the

riverbank.

……

Alice: I wonder if I've been

changed in the night? Let me think.

Was I the same when I got up this

morning? I almost think I can

remember feeling a little

different…

Image from www.ibirthdayclipart.com

Curriculum Learning

[Bengio et al., 2009]



A curriculum is a sequence of weighting schemes of the

training data:



 assigns more weight to “easier” training samples



Each subsequent weighting scheme assigns more weight to

“harder” samples



 assigns uniform weight to each sample



Learning is iterative



In each iteration, the learner is



initialized with the model learned during the previous

iteration



trained from the data weighted by the current weighting

scheme

Experiments



Learning a probabilistic dependency grammar from the

Wall Street Journal corpus of the Penn Treebank



Base learning algorithm



Expectation-maximization



Sentence complexity measure



Sentence length



Sentence likelihood given the learned grammar



Weight Assignment



0 or 1



A continuous function

Experimental Results



All of the four curricula help learning.

Questions



Under what conditions does a curriculum help in

unsupervised learning of probabilistic grammars?



How can we design good curricula?



How can we design algorithms that can take advantage of

the curricula?

The Incremental Construction Hypothesis



An ideal curriculum gradually emphasizes data samples

that help the learner to

successively discover new

substructures

 (i.e., grammar rules) of the target grammar,

which facilitates the learning.



We say a curriculum                              satisfies

incremental construction if:



For any

, the weighted training data correspond to a

sentence distribution defined by a probabilistic grammar



For any

 is a sub-grammar of

(See Section 3 of the paper for the more precise definitions)

Theoretical Analysis



Theorem:

If a curriculum satisfies incremental construction,

then for any          s.t.

, we have

where      is the      distance between the grammar rule

probabilities;          is the total variation distance between the

distributions of grammatical structures defined by the two

grammars.

undefined

Without a curriculum

With a curriculum

Intermediate grammars

Guidelines for Curriculum Design



A good curriculum should:



(approximately) satisfy incremental construction



effectively break down the target grammar into as many

chunks as possible



at each stage, introduce the new rule(s) that results in the

largest number of new sentences



if r

 is required for r

 to be used, then r

 shall be

introduced earlier than r



among rules with the same LFS, rules with larger

probabilities shall be introduced first

Guideline for Algorithm Design



Observation



the learning target at each stage of a curriculum is a partial

grammar



Guideline



avoid the over-fitting to this partial grammar that hinders

the acquisition of new grammar rules in later stages

Experiments on Synthetic Data



Data generated from the Treebank grammar of WSJ30



Curricula constructed based on the target grammar



Ideal

Satisfies all the guidelines



Sub-Ideal

Doesn’t satisfy the 3

rd

 guideline: randomly

choosing new grammar rules at each stage



Random

Doesn’t satisfy any guideline: randomly choosing

new sentences at each stage



Ideal-10

Sub-Ideal-10

Random-10

Introduce at

least 10 new sentences at each stage, hence containing fewer

stages



Length-based

Introduces new sentences based on their

lengths

Experiments on Synthetic Data

Length-based Curriculum



Very similar to the ideal curricula in this case (measured

by rank correlation)

Analysis on Real Data



Ideal curricula cannot be constructed in unsupervised

learning from real data



We find evidence that the length-based curriculum can be

seen as a proxy for an ideal curriculum on real data

Evidence from WSJ30



The introduction of grammar rules is spread throughout

the entire curriculum



More frequently used rules are introduced earlier

Evidence from WSJ30



Grammar rules introduced in earlier stages are always

used in sentences introduced in later stages

Evidence from WSJ30



In the sequence of intermediate grammars, most rule

probabilities first increase and then decrease, which

satisfies a relaxed definition of ideal curricula that satisfy

incremental construction

Conclusion



We have introduced the

incremental construction hypothesis



an explanation of the benefits of curricula in unsupervised

learning of probabilistic grammars.



a source of guidelines for designing curricula as well as

unsupervised grammar learning algorithms



The hypothesis is supported by both

theoretical analysis

and

experimental results

(on both synthetic and real data)

undefined

Thank You!

Q&A

undefined

Backup

undefined

: the length of the shortest sentence in the set of sentences

that use rule

undefined

Mean and std of the lengths of the sentences that use each rule

undefined

The change of probabilities of VBD headed rules with the stages

of the length-based curriculum in the treebank grammar.

Slide Note

Embed Share

Download

In the Artificial Intelligence Research Laboratory at Iowa State University, researchers study the utility of curricula in unsupervised learning of probabilistic grammars. They explore grammar learning with a curriculum, the incremental construction hypothesis, and provide theoretical analysis supported by empirical evidence. Probabilistic grammars, used widely in natural language parsing and bioinformatics, pose challenges in specifying grammars, making machine learning a practical alternative. The research delves into training corpus, probabilistic grammar induction, supervised versus unsupervised methods, and current approaches in processing entire text corpora for grammar learning.

shreehan Follow

Uploaded on Feb 21, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Artificial Intelligence Research Laboratory Department of Computer Science On the Utility of Curricula in Unsupervised Learning of Probabilistic Grammars Kewei Tu and Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science Iowa State University

Artificial Intelligence Research Laboratory Department of Computer Science Outline Unsupervised Grammar Learning Grammar Learning with a Curriculum The Incremental Construction Hypothesis Theoretical Analysis Empirical Support 2

Artificial Intelligence Research Laboratory Department of Computer Science Probabilistic Grammars A probabilistic grammar is a set of probabilistic production rules that define a joint probability of a grammatical structure and its sentence P = 2.2 10- 6 Example from [Jurafsky & Martin, 2006] 3

Artificial Intelligence Research Laboratory Department of Computer Science Probabilistic Grammars Probabilistic grammars are widely used in Natural language parsing Bioinformatics, e.g., RNA structure modeling Pattern recognition Specifying grammars is hard Machine learning offers a practical alternative 4

Artificial Intelligence Research Laboratory Department of Computer Science Learning a grammar from a corpus Training Corpus Probabilistic Grammar Induction S NP VP NP Det N VP Vt NP (0.3) | Vi PP (0.2) | rolls (0.2) | bounces(0.1) A square is above the triangle. A triangle rolls. The square rolls. A triangle is above the square. A circle touches a square. Supervised Methods Rely on a training corpus of sentences annotated with grammatical structures (parses) Unsupervised Methods Do not require annotated data 5

Artificial Intelligence Research Laboratory Department of Computer Science Current Approaches Process the entire corpus to learn the grammar No, it wasn't Black Monday. But while the New York Stock Exchange didn't fall apart Friday as the Dow Jones Industrial Average plunged 190.58 points -- most of it in the final hour -- it barely managed to stay this side of chaos. Some circuit breakers ' installed after the October 1987 crash failed their first test, traders say, unable to cool the selling panic Image from www.editorsweblog.org Image from www.christart.com 6

Artificial Intelligence Research Laboratory Department of Computer Science Grammar Learning with a Curriculum Good. Come here. The rabbit is behind the tree. Alice is sitting on the riverbank. Alice: I wonder if I've been changed in the night? Let me think. Was I the same when I got up this morning? I almost think I can remember feeling a little different Image from www.ibirthdayclipart.com Start with the simplest sentences Progress to increasingly more complex sentences 7

Artificial Intelligence Research Laboratory Department of Computer Science Curriculum Learning [Bengio et al., 2009] A curriculum is a sequence of weighting schemes of the training data: assigns more weight to easier training samples Each subsequent weighting scheme assigns more weight to harder samples assigns uniform weight to each sample Learning is iterative In each iteration, the learner is initialized with the model learned during the previous iteration trained from the data weighted by the current weighting scheme 8

Artificial Intelligence Research Laboratory Department of Computer Science Experiments Learning a probabilistic dependency grammar from the Wall Street Journal corpus of the Penn Treebank Base learning algorithm Expectation-maximization Sentence complexity measure Sentence length Sentence likelihood given the learned grammar Weight Assignment 0 or 1 A continuous function 9

Artificial Intelligence Research Laboratory Department of Computer Science Experimental Results All of the four curricula help learning. 10

Artificial Intelligence Research Laboratory Department of Computer Science Questions Under what conditions does a curriculum help in unsupervised learning of probabilistic grammars? How can we design good curricula? How can we design algorithms that can take advantage of the curricula? 11

Artificial Intelligence Research Laboratory Department of Computer Science The Incremental Construction Hypothesis An ideal curriculum gradually emphasizes data samples that help the learner to successively discover new substructures (i.e., grammar rules) of the target grammar, which facilitates the learning. We say a curriculum incremental construction if: For any , the weighted training data correspond to a sentence distribution defined by a probabilistic grammar For any , is a sub-grammar of (See Section 3 of the paper for the more precise definitions) satisfies 12

Artificial Intelligence Research Laboratory Department of Computer Science Theoretical Analysis Theorem:If a curriculum satisfies incremental construction, then for any s.t. , we have where probabilities; distributions of grammatical structures defined by the two grammars. is the distance between the grammar rule is the total variation distance between the 13

Intermediate grammars With a curriculum G0 Gn Without a curriculum 14

Artificial Intelligence Research Laboratory Department of Computer Science Guidelines for Curriculum Design A good curriculum should: (approximately) satisfy incremental construction effectively break down the target grammar into as many chunks as possible at each stage, introduce the new rule(s) that results in the largest number of new sentences if r1 is required for r2 to be used, then r1 shall be introduced earlier than r2 among rules with the same LFS, rules with larger probabilities shall be introduced first 15

Artificial Intelligence Research Laboratory Department of Computer Science Guideline for Algorithm Design Observation the learning target at each stage of a curriculum is a partial grammar Guideline avoid the over-fitting to this partial grammar that hinders the acquisition of new grammar rules in later stages 16

Artificial Intelligence Research Laboratory Department of Computer Science Experiments on Synthetic Data Data generated from the Treebank grammar of WSJ30 Curricula constructed based on the target grammar Ideal: Satisfies all the guidelines Sub-Ideal: Doesn t satisfy the 3rd guideline: randomly choosing new grammar rules at each stage Random: Doesn t satisfy any guideline: randomly choosing new sentences at each stage Ideal-10, Sub-Ideal-10, Random-10: Introduce at least 10 new sentences at each stage, hence containing fewer stages Length-based: Introduces new sentences based on their lengths 17

Artificial Intelligence Research Laboratory Department of Computer Science Experiments on Synthetic Data 18

Artificial Intelligence Research Laboratory Department of Computer Science Length-based Curriculum Very similar to the ideal curricula in this case (measured by rank correlation) 19

Artificial Intelligence Research Laboratory Department of Computer Science Analysis on Real Data Ideal curricula cannot be constructed in unsupervised learning from real data We find evidence that the length-based curriculum can be seen as a proxy for an ideal curriculum on real data 20

Artificial Intelligence Research Laboratory Department of Computer Science Evidence from WSJ30 The introduction of grammar rules is spread throughout the entire curriculum More frequently used rules are introduced earlier 21

Artificial Intelligence Research Laboratory Department of Computer Science Evidence from WSJ30 Grammar rules introduced in earlier stages are always used in sentences introduced in later stages 22

Artificial Intelligence Research Laboratory Department of Computer Science Evidence from WSJ30 In the sequence of intermediate grammars, most rule probabilities first increase and then decrease, which satisfies a relaxed definition of ideal curricula that satisfy incremental construction 23

Artificial Intelligence Research Laboratory Department of Computer Science Conclusion We have introduced the incremental construction hypothesis an explanation of the benefits of curricula in unsupervised learning of probabilistic grammars. a source of guidelines for designing curricula as well as unsupervised grammar learning algorithms The hypothesis is supported by both theoretical analysis and experimental results (on both synthetic and real data) 24