Sequential Approximate Inference with Limited Resolution Measurements

Approximate Inference through
Sequential Measurements of
Likelihoods Accounts for Hick’s Law
Xiang Li
NYU
with Luigi Acerbi and Wei Ji Ma
1
Which category did the black dot come from?
Li & Ma (2017)
2
Inference among multiple categories
Optimal inference is implemented by Bayes’ rule
which maximizes 
probability correct
3
Inference among multiple categories
Optimal inference is implemented by Bayes’ rule
which maximizes 
probability correct
What if inference has limited “resolution”?
Specifically, we assume that the likelihoods can only be “measured” in
a noisy way.
4
Inference among multiple categories
Optimal inference is implemented by Bayes’ rule
which maximizes 
probability correct
What if inference has limited “resolution”?
Specifically, we assume that the likelihoods can only be “measured” in
a noisy way.
Central question 
of the project
How to sequentially refine the approximation if each measurement is
costly?
We are looking for an 
active sampling 
strategy.
5
Related frameworks
Evidence accumulation (Wald & Wolfowitz, 1948; Ratcliff,
1978): but not Bayesian
6
Related frameworks
Evidence accumulation (Wald & Wolfowitz, 1948; Ratcliff,
1978): but not Bayesian
Resource rationality (Griffiths, Lieder, & Goodman, 2015):
but they sample from a known posterior. We gradually
approximate the posterior.
7
Related frameworks
Evidence accumulation (Wald & Wolfowitz, 1948; Ratcliff,
1978): but not Bayesian
Resource rationality (Griffiths, Lieder, & Goodman, 2015):
but they sample from a known posterior. We gradually
approximate the posterior.
Information sampling (
Oaksford and Chater, 1994
): but this is
sampling directly from the environment. We work at a more
abstract level.
8
Related frameworks
Evidence accumulation (Wald & Wolfowitz, 1948; Ratcliff,
1978): but not Bayesian
Resource rationality (Griffiths, Lieder, & Goodman, 2015):
but they sample from a known posterior. We gradually
approximate the posterior.
Information sampling (
Oaksford and Chater, 1994
): but this is
sampling directly from the environment. We work at a more
abstract level.
Bayesian optimization (
Jones, Schonlau, & Welch, 1998
):
machine learning technique to choose a sequence of
parameter combinations to evaluate
9
Inference from observation
N categories
Each observation is informative 
only about one category
10
Inference from observation
N categories
Each observation is informative 
only about one category
Likelihood of the 
i
th
 category
     is the set of observations made for 
i
th
 category
11
Inference from observation
N categories
Each observation is informative 
only about one category
Likelihood of the 
i
th
 category
     is the set of observations made for 
i
th
 category
Posterior of the 
i
th
 category 
12
Abstraction step
Now forget about the “raw observations” 
x
i
 and think about
the likelihoods 
L
i
 abstractly
13
Abstraction step
Now forget about the “raw observations” 
x
i
 and think about
the likelihoods 
L
i
 abstractly
Posterior of 
i
th
 category:
14
Abstraction step
Now forget about the “raw observations” 
x
i
 and think about
the likelihoods 
L
i
 abstractly
Key concept
: Instead of explicitly modeling noisy observations
from the environment, we model noisy “measurements” of the
likelihood.
Posterior of 
i
th
 category:
j
th
 measurement of 
i
th
 likelihood: 
l
ij
Assume lognormal noise:
Assume conditional independence
15
Additional assumptions
Prior over the likelihoods 
L
i
16
Additional assumptions
Action set: at each time step, the agent has two possible actions:
Decide on a category
 and receive a reward for correctness
Make a new measurement
: select a category, receive one
measurement from that category, pay a cost, go to next time step
Prior over the likelihoods 
L
i
A sequence of actions in discrete time
17
Summary of the problem
There are true posteriors but those are unknown to the agent.
18
Summary of the problem
There are true posteriors but those are unknown to the agent.
Given a prior over likelihoods 
L
i
, noise in the measurements of
L
i
, a reward for correctness, and a cost per measurement,
what is a good policy?
19
Summary of the problem
There are true posteriors but those are unknown to the agent.
Given a prior over likelihoods 
L
i
, noise in the measurements of
L
i
, a reward for correctness, and a cost per measurement,
what is a good policy?
The Bellman equations are intractable. Therefore consider
myopic policies instead (thinking one step ahead)
20
Approximate algorithm
Step 1.  Based on current noisy measurements of all likelihoods,
calculate a posterior distribution over each category likelihood
{
L
1
, ... , 
L
n
}
21
Approximate algorithm
Step 2.  Use those to calculate a posterior distribution over the
posteriors {
P
1
, ... , 
P
n
}
22
Approximate algorithm
Step 3. Based on that posterior, calculate a posterior distribution
over the max posterior 
M
=max {
P
1
, ... , 
P
n
}. We denote this by
p
(
M
|data).
23
M
Approximate algorithm
Step 4. Consider each category a candidate for making another
measurement.
Separately for each category 
i
, simulate a new measurement
from that category based on current knowledge (Thompson
sampling)
Calculate the reduction 
R
i
 in Var(
M
|data) due to this simulated
measurement.
Step 5. Is max{
R
1
,
…,
R
n
} > threshold?
If so, choose argmax category to make a new measurement from.
Make an actual measurement. Go back to Step 1.
If not, decide: choose the category that maximizes 
[
P
i
].
24
Result
Sampling cost
low
high
25
Hick’s law
26
Hick, 1952
Hick’s law
Found experimentally in light detection (Hick,
1952), dual tasks (Damos & Wickens, 1977),
pitch judgment (Simpson & Huron, 1994), key
strokes (Logan, Ulrich, & Lindsey, 2016)
Has been explained in terms of competing
race model (Usher & McClelland, 2001) and a
memory-based model (
Schneider & Anderson,
2010
)
27
Conclusion
We conceptualized approximate inference as a
process of making noisy “measurements” of
category likelihoods (one category at a time).
A myopic iterative measurement policy shows
a relationship between the number of
iterations and the number of categories that
resembles Hick’s law.
28
Open questions and possible
extensions
More direct experimental evidence?
Assumption that each observation is
informative only about one category
Other stopping rules
Other acquisition functions
Memory decay
Relation with other models
29
Slide Note
Embed
Share

Delve into the world of sequential approximate inference through sequential measurements of likelihoods, accounting for Hick's Law. Explore optimal inference strategies implemented by Bayes rule and tackle the challenges of limited resolution measurements. Discover the central question of refining approximations with costly measurements and seek active sampling strategies in this nuanced study. Related frameworks such as evidence accumulation and resource rationality provide additional context and insights into the complex world of inference.

  • Sequential approximation
  • Inference
  • Likelihood measurements
  • Bayesian rule
  • Active sampling

Uploaded on Oct 09, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Approximate Inference through Sequential Measurements of Likelihoods Accounts for Hick s Law Xiang Li NYU with Luigi Acerbi and Wei Ji Ma 1

  2. Which category did the black dot come from? Li & Ma (2017) 2

  3. Inference among multiple categories Li Pi= Optimal inference is implemented by Bayes rule which maximizes probability correct k Lk 3

  4. Inference among multiple categories Li Pi= Optimal inference is implemented by Bayes rule which maximizes probability correct k Lk What if inference has limited resolution ? Specifically, we assume that the likelihoods can only be measured in a noisy way. 4

  5. Inference among multiple categories Li Pi= Optimal inference is implemented by Bayes rule which maximizes probability correct k Lk What if inference has limited resolution ? Specifically, we assume that the likelihoods can only be measured in a noisy way. Central question of the project How to sequentially refine the approximation if each measurement is costly? We are looking for an active sampling strategy. 5

  6. Related frameworks Evidence accumulation (Wald & Wolfowitz, 1948; Ratcliff, 1978): but not Bayesian 6

  7. Related frameworks Evidence accumulation (Wald & Wolfowitz, 1948; Ratcliff, 1978): but not Bayesian Resource rationality (Griffiths, Lieder, & Goodman, 2015): but they sample from a known posterior. We gradually approximate the posterior. 7

  8. Related frameworks Evidence accumulation (Wald & Wolfowitz, 1948; Ratcliff, 1978): but not Bayesian Resource rationality (Griffiths, Lieder, & Goodman, 2015): but they sample from a known posterior. We gradually approximate the posterior. Information sampling (Oaksford and Chater, 1994): but this is sampling directly from the environment. We work at a more abstract level. 8

  9. Related frameworks Evidence accumulation (Wald & Wolfowitz, 1948; Ratcliff, 1978): but not Bayesian Resource rationality (Griffiths, Lieder, & Goodman, 2015): but they sample from a known posterior. We gradually approximate the posterior. Information sampling (Oaksford and Chater, 1994): but this is sampling directly from the environment. We work at a more abstract level. Bayesian optimization (Jones, Schonlau, & Welch, 1998): machine learning technique to choose a sequence of parameter combinations to evaluate 9

  10. Inference from observation N categories Each observation is informative only about one category 10

  11. Inference from observation N categories Each observation is informative only about one category ( ) ( )= p xiC = Ci Likelihood of the ith category is the set of observations made for ith category Lixi ix 11

  12. Inference from observation N categories Each observation is informative only about one category ( ) ( )= p xiC = Ci Likelihood of the ith category is the set of observations made for ith category Lixi ix Li Posterior of the ith category Pi= k Lk 12

  13. Abstraction step Now forget about the raw observations xi and think about the likelihoods Li abstractly 13

  14. Abstraction step Now forget about the raw observations xi and think about the likelihoods Li abstractly Li Pi= Posterior of ith category: k Lk 14

  15. Abstraction step Now forget about the raw observations xi and think about the likelihoods Li abstractly Li Pi= Posterior of ith category: k Lk Key concept: Instead of explicitly modeling noisy observations from the environment, we model noisy measurements of the likelihood. jth measurement of ith likelihood: lij Assume lognormal noise: ( )= Normal loglij;logLi,s2 ( ) p loglijLi Assume conditional independence 15

  16. Additional assumptions Prior over the likelihoods Li ( ) ( )= Normal logLi;mL,sL 2 p logLi 16

  17. Additional assumptions Prior over the likelihoods Li ( ) ( )= Normal logLi;mL,sL 2 p logLi A sequence of actions in discrete time Action set: at each time step, the agent has two possible actions: Decide on a category and receive a reward for correctness Make a new measurement: select a category, receive one measurement from that category, pay a cost, go to next time step 17

  18. Summary of the problem There are true posteriors but those are unknown to the agent. 18

  19. Summary of the problem There are true posteriors but those are unknown to the agent. Given a prior over likelihoods Li, noise in the measurements of Li, a reward for correctness, and a cost per measurement, what is a good policy? 19

  20. Summary of the problem There are true posteriors but those are unknown to the agent. Given a prior over likelihoods Li, noise in the measurements of Li, a reward for correctness, and a cost per measurement, what is a good policy? The Bellman equations are intractable. Therefore consider myopic policies instead (thinking one step ahead) 20

  21. Approximate algorithm Step 1. Based on current noisy measurements of all likelihoods, calculate a posterior distribution over each category likelihood {L1, ... , Ln} 21

  22. Approximate algorithm Step 2. Use those to calculate a posterior distribution over the posteriors {P1, ... , Pn} 22

  23. Approximate algorithm Step 3. Based on that posterior, calculate a posterior distribution over the max posterior M=max {P1, ... , Pn}. We denote this by p(M|data). M 23

  24. Approximate algorithm Step 4. Consider each category a candidate for making another measurement. Separately for each category i, simulate a new measurement from that category based on current knowledge (Thompson sampling) Calculate the reduction Ri in Var(M|data) due to this simulated measurement. Step 5. Is max{R1, ,Rn} > threshold? If so, choose argmax category to make a new measurement from. Make an actual measurement. Go back to Step 1. If not, decide: choose the category that maximizes [Pi]. 24

  25. Result high Sampling cost low 25

  26. Hicks law Hick, 1952 26

  27. Hicks law Found experimentally in light detection (Hick, 1952), dual tasks (Damos & Wickens, 1977), pitch judgment (Simpson & Huron, 1994), key strokes (Logan, Ulrich, & Lindsey, 2016) Has been explained in terms of competing race model (Usher & McClelland, 2001) and a memory-based model (Schneider & Anderson, 2010) 27

  28. Conclusion We conceptualized approximate inference as a process of making noisy measurements of category likelihoods (one category at a time). A myopic iterative measurement policy shows a relationship between the number of iterations and the number of categories that resembles Hick s law. 28

  29. Open questions and possible extensions More direct experimental evidence? Assumption that each observation is informative only about one category Other stopping rules Other acquisition functions Memory decay Relation with other models 29

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#