CSCI 5822 Probabilistic Models of Human and Machine Learning

CSCI 5822

Probabilistic Models of

Human and Machine Learning

Mike Mozer

Department of Computer Science and

Institute of Cognitive Science

University of Colorado at Boulder

Hidden Markov Models

Room Wandering



I’m going to wander around my house and tell you

objects I see.



Your task is to infer what room I’m in at every point in

time.

Observations

•

Sink

•

Toilet

•

Towel

•

Bed

•

Bookcase

•

Bench

•

Television

•

Couch

•

Pillow

•

…

{bathroom, kitchen, laundry room}

{bathroom}

{bathroom}

{bedroom}

{bedroom, living room}

{bedroom, living room, entry}

{living room}

{living room}

{living room, bedroom, entry}

…

Another Example:

The Occasionally Corrupt Casino



A casino uses a fair die most of the time, but occasionally

switches to a loaded one



Observation probabilities



Fair die: Prob(1) = Prob(2) = . . . = Prob(6) = 1/6



Loaded die: Prob(1) = Prob(2) = . . . = Prob(5) = 1/10, Prob(6) = ½



Transition probabilities



Prob(Fair |

Loaded) = 0.01



Prob(

Loaded

Fair

) = 0.2



Transitions between states obey a Markov process

Another Example:

The Occasionally Corrupt Casino



Suppose we know how the casino operates, and we

observe a series of die tosses



    3 4 1 5 2 5 6 6 6 4 6 6 6 1 5 3



Can we infer which die was used?



F F F F F F L L L L L L L F F F



Inference requires examination of

sequence

 not individual

trials.



Your best guess about the current instant can be informed

by future observations.

Formalizing This Problem



Observations over time

Y(1), Y(2), Y(3), …



Hidden (unobserved) state

S(1), S(2), S(3), …



Hidden state is discrete



Here, observations are also discrete but can be continuous



Y(t) depends on S(t)



S(t+1) depends on S(t)

Hidden Markov Model



Markov Process



Given the present state, earlier observations provide

no information about the future



Given the present state, past and future are

independent

Application Domains



Character recognition



Word / string recognition

Application Domains



Speech recognition

Application Domains



Action/Activity Recognition

Figures courtesy of B. K. Sin

HMM Is A Probabilistic Generative Model

observations

hidden state

Inference on HMM



State inference and estimation

P(S(t)|Y(1),…,Y(t))

Given a series of observations, what’s the current hidden state?

P(S|Y)

Given a series of observations, what is the joint distribution over hidden states?

argmax

[P(S|Y)]

Given a series of observations, what’s the

most likely

 values of the hidden state?

(decoding problem)



Prediction

P(Y(t+1)|Y(1),…,Y(t))

Given a series of observations, what observation will come next?



Evaluation and Learning

P(Y|

𝜃,𝜀,𝜋

Given a series of observations, what is the probability that the observations were generated by the

model?

argmax

𝜃,𝜀,𝜋

P(Y|𝜃,𝜀,𝜋)

What model parameters maximize the likelihood of the data?

Is Inference Hopeless?



Complexity is O(N

State Inference: Forward Agorithm



Goal:  Compute P(S

| Y

1…t

) ~ P(S

, Y

1…t

≐

α

(S



Computational Complexity: O(T N

Deriving The Forward Algorithm

Slide stolen

from Dirk

Husmeier

Notation change warning:

≅

 current time (was t)

What Can We Do With

α

Notation change warning:

≅

 current time (was t)

State Inference: Forward-Backward Algorithm



Goal:  Compute P(S

| Y

1…T

Optimal State Estimation

Viterbi Algorithm:

Finding The Most Likely State Sequence

Slide stolen

from Dirk

Husmeier

Notation change warning:

≅

 current time step (previously t)

≅

 total number time steps (prev. T)

Viterbi Algorithm



Relation between Viterbi and forward algorithms

Viterbi uses max operator

Forward algorithm uses summation operator



Can recover state sequence by remembering best S at

each step n



Practical issue

•

Long chain of probabilities -> underflow

Practical Trick: Operate With Logarithms



Prevents numerical underflow

Notation change warning:

≅

 current time step (previously t)

≅

 total number time steps (prev. T)

Training HMM Parameters



Baum-Welsh algorithm, special case of

Expectation-Maximization (EM)

1. Make initial guess at model parameters

2. Given observation sequence, compute hidden state

posteriors, P(S

| Y

1…T

π

θ

ε

) for t = 1 … T

3. Update model parameters

π

θ

ε

} based on inferred

state



Guaranteed to move uphill in total probability of the

observation sequence: P(Y

1…T

π

θ

ε

May get stuck in local optima

Updating Model Parameters

Using HMM For Classification



Suppose we want to recognize spoken digits 0, 1, …, 9



Each HMM is a model of the production of one digit,

and specifies P(Y|



Y: observed acoustic sequence

Note: Y can be a continuous RV



: model for digit i



We want to compute model posteriors: P(

|Y)



Use Bayes’ rule

Factorial HMM

Tree-Structured HMM

The Landscape



Discrete state space



HMM



Continuous state space



Linear dynamics

Kalman filter (exact inference)



Nonlinear dynamics

Particle filter (approximate inference)

The End

Cognitive Modeling

(Reynolds & Mozer, 2009)

Cognitive Modeling

(Reynolds & Mozer, 2009)

Cognitive Modeling

(Reynolds & Mozer, 2009)

Cognitive Modeling

(Reynolds & Mozer, 2009)

Speech Recognition



Given an audio waveform,

would like to robustly

extract & recognize any

spoken words



Statistical models can be

used to



Provide greater

robustness to noise



Adapt to accent of

different speakers



Learn from training

S. Roweis, 2004

Slide Note

Embed Share

Download

This content delves into probabilistic models of human and machine learning, covering topics such as Hidden Markov Models, Room Wandering, Observations, and The Occasionally Corrupt Casino. It explores how observations over time impact hidden unobserved states, formalizing problems, and understanding the mechanisms behind Hidden Markov Models. Concepts like the Markov Process and model inference are discussed, offering insights into understanding complex systems through probabilistic modeling.

joh_wi Follow

Uploaded on Feb 17, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

CSCI 5822 Probabilistic Models of Human and Machine Learning Mike Mozer Department of Computer Science and Institute of Cognitive Science University of Colorado at Boulder

Hidden Markov Models

Room Wandering I m going to wander around my house and tell you objects I see. Your task is to infer what room I m in at every point in time.

Observations {bathroom, kitchen, laundry room} {bathroom} {bathroom} {bedroom} {bedroom, living room} {bedroom, living room, entry} {living room} {living room} {living room, bedroom, entry} Sink Toilet Towel Bed Bookcase Bench Television Couch Pillow

Another Example: The Occasionally Corrupt Casino A casino uses a fair die most of the time, but occasionally switches to a loaded one Observation probabilities Fair die: Prob(1) = Prob(2) = . . . = Prob(6) = 1/6 Loaded die: Prob(1) = Prob(2) = . . . = Prob(5) = 1/10, Prob(6) = Transition probabilities Prob(Fair | Loaded) = 0.01 Prob(Loaded | Fair) = 0.2 Transitions between states obey a Markov process

Another Example: The Occasionally Corrupt Casino Suppose we know how the casino operates, and we observe a series of die tosses 3 4 1 5 2 5 6 6 6 4 6 6 6 1 5 3 Can we infer which die was used? F F F F F F L L L L L L L F F F Inference requires examination of sequence not individual trials. Your best guess about the current instant can be informed by future observations.

Formalizing This Problem Observations over time Y(1), Y(2), Y(3), Hidden (unobserved) state S(1), S(2), S(3), Hidden state is discrete Here, observations are also discrete but can be continuous Y(t) depends on S(t) S(t+1) depends on S(t)

Hidden& Markov& Model& Hidden Markov Model hidden state ! observations T P(S1...T,Y1...T)= P(S1)P(Y1S1) P(StSt-1)P(YtSt) t=2 Markov Process Given the present state, earlier observations provide no information about the future Given the present state, past and future are independent

Markov Chain Bigram probability A = Bong-Kee Sin bkshin@pknu.ac.kr Samples lelwhe tersause dit ce thel eerewatorard a nd m anond welinorys th ne poncl areag mbeasstustacothee leeto thas a sated the ther efory thed anctle loakeed wellwas divid niony ar apead makd in thad sele habillim Application Domains Hidden Markov Model (HMM) Character modeling & recognition Character recognition A doubly stochastic process with {Xt} and {Yt} A dynamic latent variable model explains {Yt} in terms of {Xt} - annotation Y = A F 0 3 5 6 6 6 F 0 1 2 Word recognition - model: variables & relations : Xt S, Yt R Guess what I am doing? A A A Xt Xt+1 Xt+2 X <- A word is a concatenation of simple characters/letters 1 2 4 3 Word / string recognition Linguistically constrained B B B Yt Yt+1 Yt+2 ( ) P Y ( , ) P Y X Evaluation: X - state space & transitions (FSN): S = {1, 2, 3, 4} Best explanation: argmax ( , ) X argmax ( | X ) ( ) P X Y P Y X P X a12 argmax ( , k X | ) Recognition: k P X Y A : 1 2 3 4 k , <-> <-> <-> <-> <1> <2> <3> <4> <s> <a> <0> <0> <0> <b> <1> <1> <1> <c> <2> <2> <2> <-> <-> ... <3> <3> <3> <z> ... ... ... <9> <9> <9> <-> In context?

Application Domains Speech recognition

Word recognition Guess what I am doing? A word is a concatenation of simple characters/letters Linguistically constrained <-> <-> <-> <-> <1> <2> <3> <4> <s> <a> <0> <0> <0> <b> <1> <1> <1> <c> <2> <2> <2> <-> <-> ... <3> <3> <3> <z> ... ... ... <9> <9> <9> <-> Application Domains In context? Action/Activity Recognition Dynamic Process Model - Definition Conception , Y , , , , , , , , , { } } W X Y W X W X Y W X Y W X 1 2 3 , , t poset { { } Y 1 2 3 t t +1 1 2 3 t {Wt}, {Xt} hidden processes (Factored Markov chains) } , 2, , 1 { t N S W } , 2, , 1 { t N S X swingt t +1 W W ij W W a X X kl X X a vart t +1 {Yt} observation process (Gaussian or GMM) d t t Y y ~ t Y Figures courtesy of B. K. Sin Y ik Y ik Y ik d Y ik d d Gaussian ( , ), , There are two parallel processes in walking! The model aims at decoding the gait into ... Sample Result of Decoding Problem Definition Xt Wt Xt Wt 64 Observation vector sequence: Y y y y 8 , 0 t T 56 1: 1 2 t t 7 Best state sequences: t W WW t X X X 48 Wt pedestrian 6 , 0 , 0 t W t T 1: 1 2 40 t Xt X states X t T 5 1: 1 2 32 Best joint state - observation sequences: ( , , ) t t t P W X Y ( , , T T T P W X Y 4 24 3 1: 1: 1: 16 camera 2 ) 1: 1: 1: 8 1 decode out , X W Yt 0 20 40 60 80 100 120 140 160 1: 1: T T frames (t)

HMM Is A Probabilistic Generative Model hidden state observations

Inference on HMM State inference and estimation P(S(t)|Y(1), ,Y(t)) Given a series of observations, what s the current hidden state? P(S|Y) Given a series of observations, what is the joint distribution over hidden states? argmaxS[P(S|Y)] Given a series of observations, what s the most likely values of the hidden state? (decoding problem) Prediction P(Y(t+1)|Y(1), ,Y(t)) Given a series of observations, what observation will come next? Evaluation and Learning P(Y|? ?,? ?,? ?) Given a series of observations, what is the probability that the observations were generated by the model? argmax? ?,? ?,? ? P(Y|? ?,? ?,? ?) What model parameters maximize the likelihood of the data?

Is Inference Hopeless? 1 1 1 1 1 2 2 2 2 2 2 N N K N N X1 S1 X2 S2 X3 S3 XT ST Complexity is O(NT)

State Inference: Forward Agorithm State& Inference:& Forward& Algorithm& Goal: Compute P(St | Y1 t) ~ P(St, Y1 t) ! t(St) Computational Complexity: O(T N2)

Deriving The Forward Algorithm Notation change warning: n current time (was t) Slide stolen from Dirk Husmeier

What Can We Do With ? Notation change warning: n current time (was t)

State Inference: Forward-Backward Algorithm Goal: Compute P(St | Y1 T) P(StY1...T)~ P(St,Y1...T) ~ P(St,Yt+1...TY1...t)P(Y1...t) ~ P(St,Yt+1...TY1...t) ~ P(StY1...t)P(Yt+1...TSt,Y1...t) ~ P(StY1...t)P(Yt+1...TSt) ~at(St)bt(St) !

Optimal State Estimation

Viterbi Algorithm: Finding The Most Likely State Sequence Notation change warning: n current time step (previously t) N total number time steps (prev. T) Slide stolen from Dirk Husmeier

Viterbi Algorithm Relation between Viterbi and forward algorithms Viterbi uses max operator Forward algorithm uses summation operator Can recover state sequence by remembering best S at each step n Practical issue Long chain of probabilities -> underflow

Practical Trick: Operate With Logarithms Notation change warning: n current time step (previously t) N total number time steps (prev. T) Prevents numerical underflow

Training HMM Parameters Baum-Welsh algorithm, special case of Expectation-Maximization (EM) 1. Make initial guess at model parameters 2. Given observation sequence, compute hidden state posteriors, P(St | Y1 T, , , ) for t = 1 T 3. Update model parameters { , , } based on inferred state Guaranteed to move uphill in total probability of the observation sequence: P(Y1 T | , , ) May get stuck in local optima

Updating Model Parameters

Using HMM For Classification Suppose we want to recognize spoken digits 0, 1, , 9 Each HMM is a model of the production of one digit, and specifies P(Y|Mi) Y: observed acoustic sequence Note: Y can be a continuous RV Mi: model for digit i We want to compute model posteriors: P(Mi|Y) Use Bayes rule

Factorial HMM

Tree-Structured HMM

The Landscape Discrete state space HMM Continuous state space Linear dynamics Kalman filter (exact inference) Nonlinear dynamics Particle filter (approximate inference)

CSCI 5822 Probabilistic Models of Human and Machine Learning

Download Presentation

Presentation Transcript

Related

More Related Content