Levels of Analysis in Reinforcement Learning and Decision-Making

World models
CS786
3
rd
 Feb 2022
Levels of analysis
RL in the brain
 
What is the problem?
Reinforcement 
 learning
 preferences for actions that lead to
desirable outcomes
How is it solved?
MDPs provide a general mathematical structure for solving
decision problems under uncertainty
RL was developed as a set of online learning algorithms to solve
MDPs
A critical component of model-free RL algorithms is the
temporal difference signal
Hypothesis: brain is implementing model-free RL?
Implementation
Spiking rates of dopaminergic neurons in the basal ganglia and
ventral striatum behave as if they are encoding this TD signal
Implication?
Model-free learning
Learn the mapping from action sequences to
rewarding outcomes
Don’t care about the physics of the world that
lead to different outcomes
Is this a realistic model of how human and non-
human animals learn?
LEARNING MAPS OF THE WORLD
 
Cognitive maps in rats and men
Rats learned a spatial model
Rats behave as if they
had some sense of
p(s’|s,a)
This was not explicitly
trained
Generalized from
previous experience
Corresponding paper is
recommended reading
So is Tolman’s biography
 
http://psychclassics.yorku.ca/Tolman/Maps/maps.htm
The model free vs model-based debate
Model free learning 
 learn stimulus-response
mappings = habits
What about goal-based decision-making?
Do animals not learn the physics of the world in
making decisions?
Model-based learning 
 learn what to do based
on the way the world is currently set up =
thoughtful responding?
People have argued for two systems
Thinking fast and slow (Balleine & O’Doherty, 2010)
Multiple modes of learning
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4074442/
A contemporary experiment
 The Daw task (Daw et al, 2011) is a two-stage Markov decision task
 Differentiates model-based and model-free RL accounts empirically
Predictions meet data
Behavior appears to be a mix of both strategies
What does this mean?
Active area of research
Some hunches
Moderate
training
Extensive
training
(Holland, 2004; Kilcross & Coutureau, 2003)
Current consensus
In moderately trained tasks, people behave as
if they are using model-based RL
In highly trained tasks, people behave as if
they are using model-free RL
Nuance:
Repetitive training on a small set of examples
favors model-free strategies
Limited training on a larger set of examples favors
model-based strategies
(Fulvio, Green & Schrater, 2014)
Big ticket application
How to practically shift behavior from habitual
to goal-directed in the digital space
Vice versa is understood pretty well by
Social media designers
The social media habituation cycle
State
Reward
Designed based on cognitive
psychology principles
Competing claims
 
 
 
(Twenge, Joiner, Rogers & Martin, 2017)
 
https://journals.sagepub.com/doi/full/10.1177/2167702617723376
First World kids are miserable!
 
Not true!
 
https://www.nature.com/articles/s41562-018-0506-1
 
(Orben & Przybylski, 2019)
Big ticket application
How to change computer interfaces from
promoting habitual to thoughtful engagement
Depends on being able to measure habitual vs
thoughtful behavior online
Slide Note
Embed
Share

This content delves into various levels of analysis related to computational and algorithmic problem-solving in the context of Reinforcement Learning (RL) in the brain. It discusses how RL preferences for actions leading to favorable outcomes are resolved using Markov Decision Processes (MDPs) and model-free learning algorithms. The implication of model-free learning on human and animal behavior, cognitive maps in rats and humans, and the ongoing debate between model-free versus model-based learning are also explored.

  • Reinforcement Learning
  • Decision-Making
  • Cognitive Maps
  • Model-Free Learning
  • Computational Analysis

Uploaded on Oct 06, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. World models CS786 3rdFeb 2022

  2. Levels of analysis Level Description Computational What is the problem? Algorithmic How is the problem solved? Implementation How this is done by networks of neurons?

  3. RL in the brain What is the problem? Reinforcement learning preferences for actions that lead to desirable outcomes How is it solved? MDPs provide a general mathematical structure for solving decision problems under uncertainty RL was developed as a set of online learning algorithms to solve MDPs A critical component of model-free RL algorithms is the temporal difference signal Hypothesis: brain is implementing model-free RL? Implementation Spiking rates of dopaminergic neurons in the basal ganglia and ventral striatum behave as if they are encoding this TD signal

  4. Implication? Model-free learning Learn the mapping from action sequences to rewarding outcomes Don t care about the physics of the world that lead to different outcomes Is this a realistic model of how human and non- human animals learn?

  5. LEARNING MAPS OF THE WORLD

  6. Cognitive maps in rats and men

  7. Rats learned a spatial model Rats behave as if they had some sense of p(s |s,a) This was not explicitly trained Generalized from previous experience Corresponding paper is recommended reading So is Tolman s biography http://psychclassics.yorku.ca/Tolman/Maps/maps.htm

  8. The model free vs model-based debate Model free learning learn stimulus-response mappings = habits What about goal-based decision-making? Do animals not learn the physics of the world in making decisions? Model-based learning learn what to do based on the way the world is currently set up = thoughtful responding? People have argued for two systems Thinking fast and slow (Balleine & O Doherty, 2010)

  9. Multiple modes of learning https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4074442/

  10. A contemporary experiment The Daw task (Daw et al, 2011) is a two-stage Markov decision task Differentiates model-based and model-free RL accounts empirically

  11. Predictions meet data Behavior appears to be a mix of both strategies What does this mean? Active area of research

  12. Some hunches Moderate training Extensive training (Holland, 2004; Kilcross & Coutureau, 2003)

  13. Current consensus In moderately trained tasks, people behave as if they are using model-based RL In highly trained tasks, people behave as if they are using model-free RL Nuance: Repetitive training on a small set of examples favors model-free strategies Limited training on a larger set of examples favors model-based strategies (Fulvio, Green & Schrater, 2014)

  14. Big ticket application How to practically shift behavior from habitual to goal-directed in the digital space Vice versa is understood pretty well by Social media designers

  15. The social media habituation cycle Reward State

  16. Designed based on cognitive psychology principles

  17. Competing claims First World kids are miserable! https://journals.sagepub.com/doi/full/10.1177/2167702617723376 (Twenge, Joiner, Rogers & Martin, 2017) Not true! https://www.nature.com/articles/s41562-018-0506-1 (Orben & Przybylski, 2019)

  18. Big ticket application How to change computer interfaces from promoting habitual to thoughtful engagement Depends on being able to measure habitual vs thoughtful behavior online

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#