CSCI 5822 Probabilistic Models of Human and Machine Learning

CSCI 5822
Probabilistic Models of
Human and Machine Learning
Mike Mozer
Department of Computer Science and
Institute of Cognitive Science
University of Colorado at Boulder
Hidden Markov Models
 
Room Wandering
I’m going to wander around my house and tell you
objects I see.
Your task is to infer what room I’m in at every point in
time.
Observations
 
Sink
Toilet
Towel
Bed
Bookcase
Bench
Television
Couch
Pillow
 
{bathroom, kitchen, laundry room}
{bathroom}
{bathroom}
{bedroom}
{bedroom, living room}
{bedroom, living room, entry}
{living room}
{living room}
{living room, bedroom, entry}
Another Example:
The Occasionally Corrupt Casino
 
A casino uses a fair die most of the time, but occasionally
switches to a loaded one
Observation probabilities
Fair die: Prob(1) = Prob(2) = . . . = Prob(6) = 1/6
Loaded die: Prob(1) = Prob(2) = . . . = Prob(5) = 1/10, Prob(6) = ½
Transition probabilities
Prob(Fair | 
Loaded) = 0.01
Prob(
Loaded
 
| 
Fair
) = 0.2
Transitions between states obey a Markov process
Another Example:
The Occasionally Corrupt Casino
 
Suppose we know how the casino operates, and we
observe a series of die tosses
    3 4 1 5 2 5 6 6 6 4 6 6 6 1 5 3
Can we infer which die was used?
           
F F F F F F L L L L L L L F F F
Inference requires examination of 
sequence
 not individual
trials.
Your best guess about the current instant can be informed
by future observations.
Formalizing This Problem
Observations over time
Y(1), Y(2), Y(3), …
Hidden (unobserved) state
S(1), S(2), S(3), …
Hidden state is discrete
Here, observations are also discrete but can be continuous
Y(t) depends on S(t)
S(t+1) depends on S(t)
Hidden Markov Model
 
Markov Process
Given the present state, earlier observations provide
no information about the future
Given the present state, past and future are
independent
 
Application Domains
Character recognition
Word / string recognition
Application Domains
Speech recognition
Application Domains
Action/Activity Recognition
Figures courtesy of B. K. Sin
HMM Is A Probabilistic Generative Model
observations
hidden state
Inference on HMM
 
State inference and estimation
P(S(t)|Y(1),…,Y(t))
Given a series of observations, what’s the current hidden state?
P(S|Y)
Given a series of observations, what is the joint distribution over hidden states?
argmax
S
[P(S|Y)]
Given a series of observations, what’s the 
most likely
 values of the hidden state?  
(decoding problem)
Prediction
P(Y(t+1)|Y(1),…,Y(t))
Given a series of observations, what observation will come next?
Evaluation and Learning
P(Y|
𝜃,𝜀,𝜋
)
Given a series of observations, what is the probability that the observations were generated by the
model?
argmax
𝜃,𝜀,𝜋
 
P(Y|𝜃,𝜀,𝜋)
What model parameters maximize the likelihood of the data?
Is Inference Hopeless?
Complexity is O(N
T
)
X
1
X
2
X
3
 
X
T
2
1
N
2
S
2
S
1
S
T
S
3
S
1
S
1
S
1
S
1
State Inference: Forward Agorithm
Goal:  Compute P(S
t
 | Y
1…t
) ~ P(S
t
, Y
1…t
) 
α
t
(S
t
)
Computational Complexity: O(T N
2
)
Deriving The Forward Algorithm
Slide stolen
from Dirk
Husmeier
Notation change warning:
n 
 current time (was t)
What Can We Do With 
α
?
Notation change warning:
n 
 current time (was t)
State Inference: Forward-Backward Algorithm
Goal:  Compute P(S
t
 | Y
1…T
)
Optimal State Estimation
Viterbi Algorithm:
Finding The Most Likely State Sequence
 
Slide stolen
from Dirk
Husmeier
Notation change warning:
n 
 current time step (previously t)
N 
 total number time steps (prev. T)
Viterbi Algorithm
 
Relation between Viterbi and forward algorithms
Viterbi uses max operator
 
Forward algorithm uses summation operator
 
Can recover state sequence by remembering best S at
each step n
Practical issue
Long chain of probabilities -> underflow
Practical Trick: Operate With Logarithms
Prevents numerical underflow
Notation change warning:
n 
 current time step (previously t)
N 
 total number time steps (prev. T)
Training HMM Parameters
Baum-Welsh algorithm, special case of
Expectation-Maximization (EM)
1. Make initial guess at model parameters
2. Given observation sequence, compute hidden state
posteriors, P(S
t
 | Y
1…T
,
 π
,
θ
,
ε
) for t = 1 … T
3. Update model parameters
 
{
π
,
θ
,
ε
} based on inferred
state
Guaranteed to move uphill in total probability of the
observation sequence: P(Y
1…T
 | 
π
,
θ
,
ε
)
May get stuck in local optima
Updating Model Parameters
Using HMM For Classification
Suppose we want to recognize spoken digits 0, 1, …, 9
Each HMM is a model of the production of one digit,
and specifies P(Y|
M
i
)
Y: observed acoustic sequence
Note: Y can be a continuous RV
M
i
: model for digit i
We want to compute model posteriors: P(
M
i
|Y)
Use Bayes’ rule
Factorial HMM
 
Tree-Structured HMM
 
The Landscape
Discrete state space
HMM
Continuous state space
Linear dynamics
Kalman filter (exact inference)
Nonlinear dynamics
Particle filter (approximate inference)
The End
 
Cognitive Modeling
(Reynolds & Mozer, 2009)
 
Cognitive Modeling
(Reynolds & Mozer, 2009)
 
Cognitive Modeling
(Reynolds & Mozer, 2009)
 
Cognitive Modeling
(Reynolds & Mozer, 2009)
 
Speech Recognition
Given an audio waveform,
would like to robustly
extract & recognize any
spoken words
Statistical models can be
used to
Provide greater
robustness to noise
Adapt to accent of
different speakers
Learn from training
S. Roweis, 2004
Slide Note
Embed
Share

This content delves into probabilistic models of human and machine learning, covering topics such as Hidden Markov Models, Room Wandering, Observations, and The Occasionally Corrupt Casino. It explores how observations over time impact hidden unobserved states, formalizing problems, and understanding the mechanisms behind Hidden Markov Models. Concepts like the Markov Process and model inference are discussed, offering insights into understanding complex systems through probabilistic modeling.

  • Probabilistic Models
  • Hidden Markov Models
  • Machine Learning
  • Human Learning
  • Markov Process

Uploaded on Feb 17, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CSCI 5822 Probabilistic Models of Human and Machine Learning Mike Mozer Department of Computer Science and Institute of Cognitive Science University of Colorado at Boulder

  2. Hidden Markov Models

  3. Room Wandering I m going to wander around my house and tell you objects I see. Your task is to infer what room I m in at every point in time.

  4. Observations {bathroom, kitchen, laundry room} {bathroom} {bathroom} {bedroom} {bedroom, living room} {bedroom, living room, entry} {living room} {living room} {living room, bedroom, entry} Sink Toilet Towel Bed Bookcase Bench Television Couch Pillow

  5. Another Example: The Occasionally Corrupt Casino A casino uses a fair die most of the time, but occasionally switches to a loaded one Observation probabilities Fair die: Prob(1) = Prob(2) = . . . = Prob(6) = 1/6 Loaded die: Prob(1) = Prob(2) = . . . = Prob(5) = 1/10, Prob(6) = Transition probabilities Prob(Fair | Loaded) = 0.01 Prob(Loaded | Fair) = 0.2 Transitions between states obey a Markov process

  6. Another Example: The Occasionally Corrupt Casino Suppose we know how the casino operates, and we observe a series of die tosses 3 4 1 5 2 5 6 6 6 4 6 6 6 1 5 3 Can we infer which die was used? F F F F F F L L L L L L L F F F Inference requires examination of sequence not individual trials. Your best guess about the current instant can be informed by future observations.

  7. Formalizing This Problem Observations over time Y(1), Y(2), Y(3), Hidden (unobserved) state S(1), S(2), S(3), Hidden state is discrete Here, observations are also discrete but can be continuous Y(t) depends on S(t) S(t+1) depends on S(t)

  8. Hidden& Markov& Model& Hidden Markov Model hidden state ! observations T P(S1...T,Y1...T)= P(S1)P(Y1S1) P(StSt-1)P(YtSt) t=2 Markov Process Given the present state, earlier observations provide no information about the future Given the present state, past and future are independent

  9. Markov Chain Bigram probability A = Bong-Kee Sin bkshin@pknu.ac.kr Samples lelwhe tersause dit ce thel eerewatorard a nd m anond welinorys th ne poncl areag mbeasstustacothee leeto thas a sated the ther efory thed anctle loakeed wellwas divid niony ar apead makd in thad sele habillim Application Domains Hidden Markov Model (HMM) Character modeling & recognition Character recognition A doubly stochastic process with {Xt} and {Yt} A dynamic latent variable model explains {Yt} in terms of {Xt} - annotation Y = A F 0 3 5 6 6 6 F 0 1 2 Word recognition - model: variables & relations : Xt S, Yt R Guess what I am doing? A A A Xt Xt+1 Xt+2 X <- A word is a concatenation of simple characters/letters 1 2 4 3 Word / string recognition Linguistically constrained B B B Yt Yt+1 Yt+2 ( ) P Y ( , ) P Y X Evaluation: X - state space & transitions (FSN): S = {1, 2, 3, 4} Best explanation: argmax ( , ) X argmax ( | X ) ( ) P X Y P Y X P X a12 argmax ( , k X | ) Recognition: k P X Y A : 1 2 3 4 k , <-> <-> <-> <-> <1> <2> <3> <4> <s> <a> <0> <0> <0> <b> <1> <1> <1> <c> <2> <2> <2> <-> <-> ... <3> <3> <3> <z> ... ... ... <9> <9> <9> <-> In context?

  10. Application Domains Speech recognition

  11. Word recognition Guess what I am doing? A word is a concatenation of simple characters/letters Linguistically constrained <-> <-> <-> <-> <1> <2> <3> <4> <s> <a> <0> <0> <0> <b> <1> <1> <1> <c> <2> <2> <2> <-> <-> ... <3> <3> <3> <z> ... ... ... <9> <9> <9> <-> Application Domains In context? Action/Activity Recognition Dynamic Process Model - Definition Conception , Y , , , , , , , , , { } } W X Y W X W X Y W X Y W X 1 2 3 , , t poset { { } Y 1 2 3 t t +1 1 2 3 t {Wt}, {Xt} hidden processes (Factored Markov chains) } , 2, , 1 { t N S W } , 2, , 1 { t N S X swingt t +1 W W ij W W a X X kl X X a vart t +1 {Yt} observation process (Gaussian or GMM) d t t Y y ~ t Y Figures courtesy of B. K. Sin Y ik Y ik Y ik d Y ik d d Gaussian ( , ), , There are two parallel processes in walking! The model aims at decoding the gait into ... Sample Result of Decoding Problem Definition Xt Wt Xt Wt 64 Observation vector sequence: Y y y y 8 , 0 t T 56 1: 1 2 t t 7 Best state sequences: t W WW t X X X 48 Wt pedestrian 6 , 0 , 0 t W t T 1: 1 2 40 t Xt X states X t T 5 1: 1 2 32 Best joint state - observation sequences: ( , , ) t t t P W X Y ( , , T T T P W X Y 4 24 3 1: 1: 1: 16 camera 2 ) 1: 1: 1: 8 1 decode out , X W Yt 0 20 40 60 80 100 120 140 160 1: 1: T T frames (t)

  12. HMM Is A Probabilistic Generative Model hidden state observations

  13. Inference on HMM State inference and estimation P(S(t)|Y(1), ,Y(t)) Given a series of observations, what s the current hidden state? P(S|Y) Given a series of observations, what is the joint distribution over hidden states? argmaxS[P(S|Y)] Given a series of observations, what s the most likely values of the hidden state? (decoding problem) Prediction P(Y(t+1)|Y(1), ,Y(t)) Given a series of observations, what observation will come next? Evaluation and Learning P(Y|? ?,? ?,? ?) Given a series of observations, what is the probability that the observations were generated by the model? argmax? ?,? ?,? ? P(Y|? ?,? ?,? ?) What model parameters maximize the likelihood of the data?

  14. Is Inference Hopeless? 1 1 1 1 1 2 2 2 2 2 2 N N K N N X1 S1 X2 S2 X3 S3 XT ST Complexity is O(NT)

  15. State Inference: Forward Agorithm State& Inference:& Forward& Algorithm& Goal: Compute P(St | Y1 t) ~ P(St, Y1 t) ! t(St) Computational Complexity: O(T N2)

  16. Deriving The Forward Algorithm Notation change warning: n current time (was t) Slide stolen from Dirk Husmeier

  17. What Can We Do With ? Notation change warning: n current time (was t)

  18. State Inference: Forward-Backward Algorithm Goal: Compute P(St | Y1 T) P(StY1...T)~ P(St,Y1...T) ~ P(St,Yt+1...TY1...t)P(Y1...t) ~ P(St,Yt+1...TY1...t) ~ P(StY1...t)P(Yt+1...TSt,Y1...t) ~ P(StY1...t)P(Yt+1...TSt) ~at(St)bt(St) !

  19. Optimal State Estimation

  20. Viterbi Algorithm: Finding The Most Likely State Sequence Notation change warning: n current time step (previously t) N total number time steps (prev. T) Slide stolen from Dirk Husmeier

  21. Viterbi Algorithm Relation between Viterbi and forward algorithms Viterbi uses max operator Forward algorithm uses summation operator Can recover state sequence by remembering best S at each step n Practical issue Long chain of probabilities -> underflow

  22. Practical Trick: Operate With Logarithms Notation change warning: n current time step (previously t) N total number time steps (prev. T) Prevents numerical underflow

  23. Training HMM Parameters Baum-Welsh algorithm, special case of Expectation-Maximization (EM) 1. Make initial guess at model parameters 2. Given observation sequence, compute hidden state posteriors, P(St | Y1 T, , , ) for t = 1 T 3. Update model parameters { , , } based on inferred state Guaranteed to move uphill in total probability of the observation sequence: P(Y1 T | , , ) May get stuck in local optima

  24. Updating Model Parameters

  25. Using HMM For Classification Suppose we want to recognize spoken digits 0, 1, , 9 Each HMM is a model of the production of one digit, and specifies P(Y|Mi) Y: observed acoustic sequence Note: Y can be a continuous RV Mi: model for digit i We want to compute model posteriors: P(Mi|Y) Use Bayes rule

  26. Factorial HMM

  27. Tree-Structured HMM

  28. The Landscape Discrete state space HMM Continuous state space Linear dynamics Kalman filter (exact inference) Nonlinear dynamics Particle filter (approximate inference)

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#