CSCI 5822 Probabilistic Models of Human and Machine Learning
This content delves into probabilistic models of human and machine learning, covering topics such as Hidden Markov Models, Room Wandering, Observations, and The Occasionally Corrupt Casino. It explores how observations over time impact hidden unobserved states, formalizing problems, and understanding the mechanisms behind Hidden Markov Models. Concepts like the Markov Process and model inference are discussed, offering insights into understanding complex systems through probabilistic modeling.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
CSCI 5822 Probabilistic Models of Human and Machine Learning Mike Mozer Department of Computer Science and Institute of Cognitive Science University of Colorado at Boulder
Room Wandering I m going to wander around my house and tell you objects I see. Your task is to infer what room I m in at every point in time.
Observations {bathroom, kitchen, laundry room} {bathroom} {bathroom} {bedroom} {bedroom, living room} {bedroom, living room, entry} {living room} {living room} {living room, bedroom, entry} Sink Toilet Towel Bed Bookcase Bench Television Couch Pillow
Another Example: The Occasionally Corrupt Casino A casino uses a fair die most of the time, but occasionally switches to a loaded one Observation probabilities Fair die: Prob(1) = Prob(2) = . . . = Prob(6) = 1/6 Loaded die: Prob(1) = Prob(2) = . . . = Prob(5) = 1/10, Prob(6) = Transition probabilities Prob(Fair | Loaded) = 0.01 Prob(Loaded | Fair) = 0.2 Transitions between states obey a Markov process
Another Example: The Occasionally Corrupt Casino Suppose we know how the casino operates, and we observe a series of die tosses 3 4 1 5 2 5 6 6 6 4 6 6 6 1 5 3 Can we infer which die was used? F F F F F F L L L L L L L F F F Inference requires examination of sequence not individual trials. Your best guess about the current instant can be informed by future observations.
Formalizing This Problem Observations over time Y(1), Y(2), Y(3), Hidden (unobserved) state S(1), S(2), S(3), Hidden state is discrete Here, observations are also discrete but can be continuous Y(t) depends on S(t) S(t+1) depends on S(t)
Hidden& Markov& Model& Hidden Markov Model hidden state ! observations T P(S1...T,Y1...T)= P(S1)P(Y1S1) P(StSt-1)P(YtSt) t=2 Markov Process Given the present state, earlier observations provide no information about the future Given the present state, past and future are independent
Markov Chain Bigram probability A = Bong-Kee Sin bkshin@pknu.ac.kr Samples lelwhe tersause dit ce thel eerewatorard a nd m anond welinorys th ne poncl areag mbeasstustacothee leeto thas a sated the ther efory thed anctle loakeed wellwas divid niony ar apead makd in thad sele habillim Application Domains Hidden Markov Model (HMM) Character modeling & recognition Character recognition A doubly stochastic process with {Xt} and {Yt} A dynamic latent variable model explains {Yt} in terms of {Xt} - annotation Y = A F 0 3 5 6 6 6 F 0 1 2 Word recognition - model: variables & relations : Xt S, Yt R Guess what I am doing? A A A Xt Xt+1 Xt+2 X <- A word is a concatenation of simple characters/letters 1 2 4 3 Word / string recognition Linguistically constrained B B B Yt Yt+1 Yt+2 ( ) P Y ( , ) P Y X Evaluation: X - state space & transitions (FSN): S = {1, 2, 3, 4} Best explanation: argmax ( , ) X argmax ( | X ) ( ) P X Y P Y X P X a12 argmax ( , k X | ) Recognition: k P X Y A : 1 2 3 4 k , <-> <-> <-> <-> <1> <2> <3> <4> <s> <a> <0> <0> <0> <b> <1> <1> <1> <c> <2> <2> <2> <-> <-> ... <3> <3> <3> <z> ... ... ... <9> <9> <9> <-> In context?
Application Domains Speech recognition
Word recognition Guess what I am doing? A word is a concatenation of simple characters/letters Linguistically constrained <-> <-> <-> <-> <1> <2> <3> <4> <s> <a> <0> <0> <0> <b> <1> <1> <1> <c> <2> <2> <2> <-> <-> ... <3> <3> <3> <z> ... ... ... <9> <9> <9> <-> Application Domains In context? Action/Activity Recognition Dynamic Process Model - Definition Conception , Y , , , , , , , , , { } } W X Y W X W X Y W X Y W X 1 2 3 , , t poset { { } Y 1 2 3 t t +1 1 2 3 t {Wt}, {Xt} hidden processes (Factored Markov chains) } , 2, , 1 { t N S W } , 2, , 1 { t N S X swingt t +1 W W ij W W a X X kl X X a vart t +1 {Yt} observation process (Gaussian or GMM) d t t Y y ~ t Y Figures courtesy of B. K. Sin Y ik Y ik Y ik d Y ik d d Gaussian ( , ), , There are two parallel processes in walking! The model aims at decoding the gait into ... Sample Result of Decoding Problem Definition Xt Wt Xt Wt 64 Observation vector sequence: Y y y y 8 , 0 t T 56 1: 1 2 t t 7 Best state sequences: t W WW t X X X 48 Wt pedestrian 6 , 0 , 0 t W t T 1: 1 2 40 t Xt X states X t T 5 1: 1 2 32 Best joint state - observation sequences: ( , , ) t t t P W X Y ( , , T T T P W X Y 4 24 3 1: 1: 1: 16 camera 2 ) 1: 1: 1: 8 1 decode out , X W Yt 0 20 40 60 80 100 120 140 160 1: 1: T T frames (t)
HMM Is A Probabilistic Generative Model hidden state observations
Inference on HMM State inference and estimation P(S(t)|Y(1), ,Y(t)) Given a series of observations, what s the current hidden state? P(S|Y) Given a series of observations, what is the joint distribution over hidden states? argmaxS[P(S|Y)] Given a series of observations, what s the most likely values of the hidden state? (decoding problem) Prediction P(Y(t+1)|Y(1), ,Y(t)) Given a series of observations, what observation will come next? Evaluation and Learning P(Y|? ?,? ?,? ?) Given a series of observations, what is the probability that the observations were generated by the model? argmax? ?,? ?,? ? P(Y|? ?,? ?,? ?) What model parameters maximize the likelihood of the data?
Is Inference Hopeless? 1 1 1 1 1 2 2 2 2 2 2 N N K N N X1 S1 X2 S2 X3 S3 XT ST Complexity is O(NT)
State Inference: Forward Agorithm State& Inference:& Forward& Algorithm& Goal: Compute P(St | Y1 t) ~ P(St, Y1 t) ! t(St) Computational Complexity: O(T N2)
Deriving The Forward Algorithm Notation change warning: n current time (was t) Slide stolen from Dirk Husmeier
What Can We Do With ? Notation change warning: n current time (was t)
State Inference: Forward-Backward Algorithm Goal: Compute P(St | Y1 T) P(StY1...T)~ P(St,Y1...T) ~ P(St,Yt+1...TY1...t)P(Y1...t) ~ P(St,Yt+1...TY1...t) ~ P(StY1...t)P(Yt+1...TSt,Y1...t) ~ P(StY1...t)P(Yt+1...TSt) ~at(St)bt(St) !
Viterbi Algorithm: Finding The Most Likely State Sequence Notation change warning: n current time step (previously t) N total number time steps (prev. T) Slide stolen from Dirk Husmeier
Viterbi Algorithm Relation between Viterbi and forward algorithms Viterbi uses max operator Forward algorithm uses summation operator Can recover state sequence by remembering best S at each step n Practical issue Long chain of probabilities -> underflow
Practical Trick: Operate With Logarithms Notation change warning: n current time step (previously t) N total number time steps (prev. T) Prevents numerical underflow
Training HMM Parameters Baum-Welsh algorithm, special case of Expectation-Maximization (EM) 1. Make initial guess at model parameters 2. Given observation sequence, compute hidden state posteriors, P(St | Y1 T, , , ) for t = 1 T 3. Update model parameters { , , } based on inferred state Guaranteed to move uphill in total probability of the observation sequence: P(Y1 T | , , ) May get stuck in local optima
Using HMM For Classification Suppose we want to recognize spoken digits 0, 1, , 9 Each HMM is a model of the production of one digit, and specifies P(Y|Mi) Y: observed acoustic sequence Note: Y can be a continuous RV Mi: model for digit i We want to compute model posteriors: P(Mi|Y) Use Bayes rule
The Landscape Discrete state space HMM Continuous state space Linear dynamics Kalman filter (exact inference) Nonlinear dynamics Particle filter (approximate inference)