Active Inference and Epistemic Value

Abstract
We offer a formal treatment of choice behaviour based on the premise that agents minimise the expected free energy of
future outcomes. Crucially, the negative free energy or quality of a policy can be decomposed into extrinsic and epistemic
(intrinsic) value. Minimising expected free energy is therefore equivalent to maximising extrinsic value or expected utility
(defined in terms of prior preferences or goals), while maximising information gain or intrinsic value; i.e., reducing
uncertainty about the causes of valuable outcomes. The resulting scheme resolves the exploration-exploitation dilemma:
epistemic value is maximised until there is no further information gain, after which exploitation is assured through
maximisation of extrinsic value. This is formally consistent with the Infomax principle, generalising formulations of active
vision based upon salience (Bayesian surprise) and optimal decisions based on expected utility and risk sensitive (KL)
control. Furthermore, as with previous active inference formulations of discrete (Markovian) problems; ad hoc softmax
parameters become the expected (Bayes-optimal) precision of beliefs about – or confidence in – policies. We focus on the
basic theory – illustrating the minimisation of expected free energy using simulations. A key aspect of this minimisation is
the similarity of precision updates and dopaminergic discharges observed in conditioning paradigms.
Active inference and epistemic value
Karl Friston, Francesco Rigoli, Dimitri Ognibene, Christoph Mathys,
Thomas FitzGerald and Giovanni Pezzulo
 
 
Premise
All agents minimize free energy (under a generative model)
All agents possess prior beliefs (preferences)
Free energy is minimized when priors are (actively) realized 
All agents believe they will minimize (expected) free energy
Set-up and definitions: active inference
action
perception
world    agent
Generative process
Generative model
Approximate posterior
Perception-action cycle
 
 
 
 
 
 
 
An example:
 
 
 
 
 
if
Full priors
– control states
Empirical priors – hidden states
Likelihood
The (normal form) generative model
Hidden states
Action
Control states
KL or risk-sensitive control
In the absence of ambiguity:
Expected utility theory
In the absence of posterior uncertainty or risk:
 
 
 
 
 
 
 
 
 
 
Priors over policies
Bayesian surprise and Infomax
In the absence of prior beliefs about outcomes:
 
 
 
 
 
 
 
 
 
Prior beliefs about policies
Expected free energy
Predicted divergence
Extrinsic value
Bayesian surprise
Predicted mutual information
 
 
 
 
 
 
 
 
 
 
The quality of a policy corresponds to (negative) expected free energy
 
 
 
 
 
 
 
 
 
Generative model of future states
Future generative model of states
Prior preferences (goals) over future outcomes
The mean field partition
And variational updates
 
Minimising free energy
 
m
i
d
b
r
a
i
n
m
o
t
o
r
 
C
o
r
t
e
x
o
c
c
i
p
i
t
a
l
 
C
o
r
t
e
x
s
t
r
i
a
t
u
m
Variational updates
Perception
Action selection
Precision
Functional anatomy
 
p
r
e
f
r
o
n
t
a
l
 
C
o
r
t
e
x
h
i
p
p
o
c
a
m
p
u
s
Predicted divergence
Predicted ambiguity
The (T-maze) problem
Y
Y
Generative model
Hidden states
Control states
Observations
Prior beliefs about control
Posterior beliefs about states
location
stimulus
CS      CS      US      NS
location
context
location
Comparing different
schemes
0
0.2
0.4
0.6
0.8
1
0
20
40
60
80
Performance
Prior preference
success rate (%)
 
 
F
E
K
L
E
U
D
A
Sensitive to risk or ambiguity
             -       -
             +      -
             +      +
Simulating conditioned responses
(in terms of precision updates)
 
Expected precision and value
Changes in expected precision reflect changes in expected value:
c.f., dopamine and reward prediction error
 
 
Simulating conditioned
responses
 
 
Learning as inference
Y
Y
Hidden states
location
context
Control states
maze
Hierarchical augmentation of state space
Bayesian belief updating between trials:
of conserved (maze) states
Learning as inference
performance
uncertainty
Summary
Optimal behaviour can be cast as a pure inference problem, in which valuable outcomes are defined in terms of prior
beliefs about future states.
Exact Bayesian inference (perfect rationality) cannot be realised physically, which means that optimal behaviour rests on
approximate Bayesian inference (bounded rationality).
Variational free energy provides a bound on Bayesian model evidence that is optimised by bounded rational behaviour.
Bounded rational behaviour requires (approximate Bayesian) inference on both hidden states of the world and (future)
control states. This mandates beliefs about action (control) that are distinct from action 
per se – beliefs that entail a
precision.
These beliefs can be cast in terms of minimising the expected free energy, given current beliefs about the state of the
world and future choices.
The ensuing quality of a policy entails epistemic value and expected utility that account for exploratory and exploitative
behaviour respectively.
Variational Bayes provides a formal account of how posterior expectations about hidden states of the world, policies and
precision depend upon each other; and may provide a metaphor for message passing in the brain.
Beliefs about choices depend upon expected precision while beliefs about precision depend upon the expected quality of
choices.
Variational Bayes induces distinct probabilistic representations (functional segregation) of hidden states, control states and
precision – and highlights the role of reciprocal message passing. This may be particularly important for expected precision
that is required for optimal inference about hidden states (perception) and control states (action selection).
The dynamics of precision updates and their computational architecture are consistent with the physiology and anatomy of
the dopaminergic system.
Slide Note
Embed
Share

This study presents a formal framework for decision-making, suggesting that agents minimize expected free energy of future outcomes by balancing extrinsic and epistemic values. By maximizing information gain and reducing uncertainty, agents navigate the exploration-exploitation dilemma. The strategy aligns with principles of active vision and optimal decision-making, demonstrating the connection between precision updates and dopaminergic processes.

  • Active Inference
  • Epistemic Value
  • Decision Behavior
  • Exploration-Exploitation Dilemma
  • Optimal Decision-Making

Uploaded on Sep 11, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Active inference and epistemic value Karl Friston, Francesco Rigoli, Dimitri Ognibene, Christoph Mathys, Thomas FitzGerald and Giovanni Pezzulo Abstract We offer a formal treatment of choice behaviour based on the premise that agents minimise the expected free energy of future outcomes. Crucially, the negative free energy or quality of a policy can be decomposed into extrinsic and epistemic (intrinsic) value. Minimising expected free energy is therefore equivalent to maximising extrinsic value or expected utility (defined in terms of prior preferences or goals), while maximising information gain or intrinsic value; i.e., reducing uncertainty about the causes of valuable outcomes. The resulting scheme resolves the exploration-exploitation dilemma: epistemic value is maximised until there is no further information gain, after which exploitation is assured through maximisation of extrinsic value. This is formally consistent with the Infomax principle, generalising formulations of active vision based upon salience (Bayesian surprise) and optimal decisions based on expected utility and risk sensitive (KL) control. Furthermore, as with previous active inference formulations of discrete (Markovian) problems; ad hoc softmax parameters become the expected (Bayes-optimal) precision of beliefs about or confidence in policies. We focus on the basic theory illustrating the minimisation of expected free energy using simulations. A key aspect of this minimisation is the similarity of precision updates and dopaminergic discharges observed in conditioning paradigms.

  2. Premise All agents minimize free energy (under a generative model) All agents possess prior beliefs (preferences) Free energy is minimized when priors are (actively) realized All agents believe they will minimize (expected) free energy

  3. Perception-action cycle Set-up and definitions: active inference Approximate posterior ( ) = = Q u Pr ( | ) a u t t t ta A action P Q R S A U : Definition: Active inference rests on the tuple ( , , , , , , ) A finite set of observations A finite set of actions A ts S Generative process A finite set of hidden states S A finite set of control states U ts A generative process = = = = ( , , ) R o s a Pr({ , , } o ,{ , o s , } s ,{ , s a , } ) o a a 0 0 0 1 t t t over observations o , hidden states s S and action a A to A generative model = = = = ( , , | P o s u m ) Pr({ , , } ,{ , o s , } ,{ , s u , } ) o o s u u perception 0 0 T T t T over observations o , hidden s states, with parameters . S and control u U = ( , ) argmin ( , , ) F o s ts An approximate posterior = = = over hidden and ( , ) Q s u Pr({ , , } ,{ , s u , } ) s s u u Generative model 0 T t T s , where control states with sufficient statistics ( , ) is a policy that indexes {1, , } K world agent = | ) a sequence of control states ( | ) ( , u , u u t T Free energy = [ ln ( , , | P o s u m if ( , , ) F o s )] [ ( , )] H Q s u E Q = + ln ( | P o m ) [ ( , )|| ( , | )] D Q s u P s u o

  4. An example: ( | , ) P u u s 1u 2 u Control states + 1 t t t ( | , ) t s u P s 1s 3s + 1 t t Hidden states 2s 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 p q r = = = = ( | , s u 0) ( | , s u 1) P s P s + + 1 1 t t t t t t 0 0 Reject or stay Accept or shift ? ? ? ? q p Low offer High offer Low offer High offer r

  5. The (normal form) generative model , ( ) ( ) ( ) ( ) ( P ) = , , , | , P o s u | | | | a m P o s P s a P u m ( ) ) = = | ( | ) ( | ) ( | ) P o s P o s P o s P o s 0 0 1 1 t t ( A | P o s Likelihood t t Action Control states C ( s u ) ) = = , , a a , , u u | ( | P s s , ) ( | P s s a P s , ) ( | ) P s a a m 1 1 0 1 0 t t t 0 1 t t T ( B | , ( ) u P s Empirical priors hidden states + 1 t t t t B ( Q ) control states = = + + Q Q | ( ) ( ( )) | ) ln ( P u + 1 t T | )] [ln ( , P o s Q o s E Q s ts ts ts+ | ) ( , Hidden states 1 1 P o ( ( P s P ) ) ) = = = C | m D if | m A 0 ( ( , ) | m to to Full priors 1

  6. Priors over policies Prior beliefs about policies ( ) P u = ( ) + + ( )) Q Q ln | ( + 1 t T Expected free energy ( ) = = | )] + | )] Q [ln ( , P o s [ ( Q o s E E H Q s | ) ( , , ) ln ( + ) ln ( | )] [ln ( | | Q s o P o m Q s | ) ( , Q o s = + , )|| | )]] [ln ( | )] [ [ ( D Q s | ( E P o m E o Q s | ) | ) ( ( Q o Q o Extrinsic value Extrinsic value Epistemic value Epistemic value = | )|| ( [ [ ( H P o | )]] [ ( D Q o | )] E s P o m ( | ) Q s Predictive divergence Predicted divergence Predictive ambiguity Predicted ambiguity KL or risk-sensitive control Bayesian surprise and Infomax Expected utility theory In the absence of ambiguity: In the absence of prior beliefs about outcomes: In the absence of posterior uncertainty or risk: ( ) = , )|| | )]] Q [ [ ( D Q s | ( E o Q s ( ) = ( ) = = | ) ln ( | )] Q Q [ln ( | )] E P o m [ln ( E P s Q s | ) ( Q o | ) | ) ( Q o ( Q s Bayesian Surprise Bayesian surprise | )|| ( | )] [ ( D Q s P s Extrinsic value Extrinsic value KL divergence Predicted divergence = | )|| | ) ( | )] [ ( , D Q s o ( Q s Q o Predictive mutual information Predicted mutual information

  7. The quality of a policy corresponds to (negative) expected free energy ( ) = | )] + | )] Q [ln ( , P o s [ ( Q o s E H Q s | ) ( , Generative model of future states Future generative model of states | ) = , ) ( | ) = , ) ( ( , P o s ( | | ) P o s P s m ( , P o s ( | | ) Q s o P o m Prior preferences (goals) over future outcomes = ( | ) ln ( P o | ) C o m m

  8. Minimising free energy The mean field partition ( ) ) = | ) ( | ) Q , , | ( | ) ( | ) ( , s Q u , Q s u Q s s Q s u 0 0 = T T t T ( = ( , | ) Q And variational updates ( ) exp( [ln ( , , , | P o s u )]) Q s E m / t Q s t ( ) exp( [ln ( , , , | P o s u )]) Q E m / Q ( ) exp( [ln ( , , , | P o s u )]) Q E m / Q

  9. Variational updates Functional anatomy motor Cortex = + A B (ln ln( ( ) )) s o a s Perception 1 1 t t t t ( ) ta = = Q u Pr ( | ) a u t t t = Q ( ) Action selection striatum Q ts to = Precision prefrontal Cortex occipital Cortex Q s hippocampus = = 1 A ( ) + + ( ) Q Q Q ( ) ( ) + 1 t T midbrain ) ( ) (ln s ( ) ln ( ) Q A A if C A ( ln ) s s Predictive uncertainty Predicted ambiguity Predictive divergence Predicted divergence ( ) = | ) | ) B B ( ( s u u s Forward sweeps over future states t t

  10. The (T-maze) problem or

  11. Generative model Prior beliefs about control YYYY location ( ) = = Q | , Q ( ( )) [ln ( P u o ( ) + , )|| | )]] | )] [ [ ( D Q s | ( E P o m E o Q s Control states u u = | ) | ) ( ( Q o Q o , , u Extrinsic value Epistemic value t T ( , | , ) Q s u s Posterior beliefs about states 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 YY context YYYY location Hidden states s s = = = = B B ( | , ) t s u ( ): ( t u 2) P s u I + 1 2 t t t s l c 0 0 a 0 0 0 0 0 0 a 1 0 0 0 0 1 0 0 1 2 1 2 0 0 1 2 1 2 0 0 A 1 = = = = = = A A A A A ( | ) : , , , P o s 1 2 3 4 t t 1 1 a a A 4 1 1 a a a a YYYY location Observations ( ( ) T = = T C 1 ( | ) 0 0 P o m c c = o o o 4 l s ) CS CS US NS T = = D ( | ) 1 0 0 0 P s m 1 2 1 2 0 stimulus

  12. Comparing different schemes Performance FE KL EU DA 80 success rate (%) 60 40 20 0 0 0.2 0.4 0.6 0.8 1 Prior preference ( ) = ) ln ( | ) ln ( + Q [ln ( | | )] Q o s E P o s Q o u P o m Sensitive to risk or ambiguity - - + - + + | ) ( , Expected utility KL control Expected Free energy

  13. Performance FE KL RL DA 80 success rate (%) 60 Simulating conditioned responses (in terms of precision updates) 40 20 0 0 0.2 0.4 0.6 0.8 1 Prior preference Precision updates Simulated (US) responses 11 400 350 10 300 9 250 Precision 8 Rate 200 7 150 6 100 5 50 4 0 1 2 3 4 1 2 3 4 Peristimulus time (sec) Peristimulus time (sec) Dopamine responses 400Simulated (CS & US) responses 4 3.5 350 3 300 2.5 250 Response Rate 2 200 1.5 150 1 100 0.5 50 0 0 1 2 3 4 1 2 3 4 Peristimulus time (sec) Peristimulus time (sec)

  14. Expected precision and value = 8 8 Q 7 = 1 6 Expected precision 5 4 3 2 1 0 -8 -7 -6 -5 -4 -3 -2 -1 0 Expected value Q Changes in expected precision reflect changes in expected value: c.f., dopamine and reward prediction error

  15. Simulating conditioned responses 400 300 Preference (utility) = = 0: 0.5 c a Rate 200 4 100 3.5 0 3 1 2 3 4 2.5 400 Response = = 300 1: 0.5 c a 2 Rate 200 1.5 100 1 0 1 2 3 4 0.5 400 0 1 1.5 2 2.5 3 3.5 4 4.5 Peristimulus time (sec) = = 2: 0.5 c a Rate 200 0 1 2 3 4 300 200 = = 0.5: 2 a c Rate 100 Uncertainty 4 0 1 2 3 4 3.5 400 3 2.5 Rate 200 = = 0.7: 2 a c Response 2 1.5 0 1 2 3 4 1 400 0.5 = = 0.9: 2 a c Rate 0 200 1 1.5 2 2.5 3 3.5 4 4.5 Peristimulus time (sec) 0 1 2 3 4 Peristimulus time

  16. Learning as inference Hierarchical augmentation of state space YYYY 1 2 1 3 1 2 1 4 = (1) (1) A A A [ , , ] Control states u u = (1) B , , u ( ) j 1 i t T = B ( ) i C 3 2 4 3 4 4 3 2 (1) B ( ) j 4 i = (1) C = j YYYY maze YY context YYYY location Hidden states s = s s s m l c Bayesian belief updating between trials: of conserved (maze) states 0 T = = = E s E s 0 T ( | , ) + P s s m (1) T D 1 ((1 ) ) ( ) e I e 4 8

  17. Learning as inference Exploration and exploitation 100 90 performance 80 70 Performance and uncertainty 60 50 40 30 20 10 uncertainty 0 1 2 3 4 Number of trials 5 6 7 8 Average dopaminergic response 2.5 2 Precision 1.5 1 0.5 20 40 60 80 Variational updates 100 120 140 160 180 Simulated dopaminergic response 300 Spikes per then 200 100 0 20 40 60 80 Variational updates 100 120 140 160 180

  18. Summary Optimal behaviour can be cast as a pure inference problem, in which valuable outcomes are defined in terms of prior beliefs about future states. Exact Bayesian inference (perfect rationality) cannot be realised physically, which means that optimal behaviour rests on approximate Bayesian inference (bounded rationality). Variational free energy provides a bound on Bayesian model evidence that is optimised by bounded rational behaviour. Bounded rational behaviour requires (approximate Bayesian) inference on both hidden states of the world and (future) control states. This mandates beliefs about action (control) that are distinct from action per se beliefs that entail a precision. These beliefs can be cast in terms of minimising the expected free energy, given current beliefs about the state of the world and future choices. The ensuing quality of a policy entails epistemic value and expected utility that account for exploratory and exploitative behaviour respectively. Variational Bayes provides a formal account of how posterior expectations about hidden states of the world, policies and precision depend upon each other; and may provide a metaphor for message passing in the brain. Beliefs about choices depend upon expected precision while beliefs about precision depend upon the expected quality of choices. Variational Bayes induces distinct probabilistic representations (functional segregation) of hidden states, control states and precision and highlights the role of reciprocal message passing. This may be particularly important for expected precision that is required for optimal inference about hidden states (perception) and control states (action selection). The dynamics of precision updates and their computational architecture are consistent with the physiology and anatomy of the dopaminergic system.

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#