Deep Reinforcement Learning Overview and Applications

Slide Note
Embed
Share

Delve into the world of deep reinforcement learning on the road to advanced AI systems like Skynet. Explore topics ranging from Markov Decision Processes to solving MDPs, value functions, and tabular solutions. Discover the paradigm of supervised, unsupervised, and reinforcement learning in various applications. Uncover the importance of setting, discount factor, and value functions in creating efficient learning algorithms.


Uploaded on Jul 15, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. DEEP REINFORCEMENT LEARNING ON THE ROAD TO SKYNET! UW CSE Deep Learning Felix Leeb

  2. OVERVIEW TODAY NEXT TIME MDPs formalizing decisions Model Based RL forward/inverse Function Approximation Planning MCTS, MPPI Value Function DQN Imitation Learning DAgger, GAIL Policy Gradients REINFORCE, NPG Advanced Topics Exploration, MARL, Meta-learning, LMDPs Actor Critic A3C, DDPG UW CSE DEEP LEARNING - FELIX LEEB 2

  3. Paradigm Supervised Learning Unsupervised Learning Reinforcement Learning Objective Classification Regression Inference Generation Prediction Control Applications UW CSE DEEP LEARNING - FELIX LEEB 3

  4. Prediction ? ? Control ? ? ? UW CSE DEEP LEARNING - FELIX LEEB 4

  5. SETTING Environment State/Observation Action Reward Agent using policy UW CSE DEEP LEARNING - FELIX LEEB 5

  6. MARKOV DECISION PROCESSES Transition function Reward function State space Action space UW CSE DEEP LEARNING - FELIX LEEB 6

  7. DISCOUNT FACTOR We want to be greedy but not impulsive Implicitly takes uncertainty in dynamics into account Mathematically: <1 allows infinite horizon returns Return: UW CSE DEEP LEARNING - FELIX LEEB 7

  8. SOLVING AN MDP Objective: Goal: UW CSE DEEP LEARNING - FELIX LEEB 8

  9. VALUE FUNCTIONS Value = expected gain of a state Q function action specific value function Advantage function how much more valuable is an action Value depends on future rewards depends on policy UW CSE DEEP LEARNING - FELIX LEEB 9

  10. TABULAR SOLUTION: POLICY ITERATION Policy Evaluation Policy Update UW CSE DEEP LEARNING - FELIX LEEB 10

  11. Q LEARNING UW CSE DEEP LEARNING - FELIX LEEB 11

  12. FUNCTION APPROXIMATION Model: Training data: Loss function: where UW CSE DEEP LEARNING - FELIX LEEB 12

  13. IMPLEMENTATION Action-in Action-out Off-Policy Learning The target depends in part on our model old observations are still useful Use a Replay Buffer of most recent transitions as dataset UW CSE DEEP LEARNING - FELIX LEEB 13

  14. DEEP Q NETWORKS (DQN) Mnih et al. (2015) UW CSE DEEP LEARNING - FELIX LEEB 14

  15. DQN ISSUES Convergence is not guaranteed hope for deep magic! Replay Buffer Error Clipping Reward scaling Using replicas Double Q Learning decouple action selection and value estimation UW CSE DEEP LEARNING - FELIX LEEB 15

  16. POLICY GRADIENTS Parameterize policy and update those parameters directly Enables new kinds of policies: stochastic, continuous action spaces On policy learning learn directly from your actions UW CSE DEEP LEARNING - FELIX LEEB 16

  17. POLICY GRADIENTS Approximate expectation value from samples UW CSE DEEP LEARNING - FELIX LEEB 17

  18. REINFORCE Sutton et al. (2000) UW CSE DEEP LEARNING - FELIX LEEB 18

  19. VARIANCE REDUCTION Constant offsets make it harder to differentiate the right direction Remove offset a priori value of each state UW CSE DEEP LEARNING - FELIX LEEB 19

  20. ADVANCED POLICY GRADIENT METHODS For stochastic functions, the gradient is not the best direction Consider the KL divergence NPG TRPO PPO Approximating the Fisher information matrix Computing gradients with KL constraint Gradients with KL penalty UW CSE DEEP LEARNING - FELIX LEEB 20

  21. ADVANCED POLICY GRADIENT METHODS Heess et al. (2017) Rajeswaran et al. (2017) UW CSE DEEP LEARNING - FELIX LEEB 21

  22. ACTOR CRITIC Critic using Q learning update Estimate Advantage Propose Actions Actor using policy gradient update UW CSE DEEP LEARNING - FELIX LEEB 22

  23. ASYNC ADVANTAGE ACTOR-CRITIC (A3C) Mnih et al. (2016) UW CSE DEEP LEARNING - FELIX LEEB 23

  24. DDPG Off-policy learning using deterministic policy gradients Max Ferguson (2017) UW CSE DEEP LEARNING - FELIX LEEB 24

Related


More Related Content