Research
-
[Lecture 12] (2/2) Deep Reinforcement LearningResearch/RL_DeepMind 2024. 8. 9. 16:18
https://www.youtube.com/watch?v=cVzvNZOBaJ4&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=12https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%2012-%20Deep%20RL%201%20.pdf In this section, I want to give you some insight in what happens when ideas from reinforcement learing are combined with deep learning, both in terms of how known RL issues manifest when using deep..
-
[Lecture 12] (1/2) Deep Reinforcement LearningResearch/RL_DeepMind 2024. 8. 9. 11:47
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%2012-%20Deep%20RL%201%20.pdfhttps://www.youtube.com/watch?v=cVzvNZOBaJ4&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=12Deep reinforcement learning is the combination of reinforcement learning algorithms with the use of deep neural networks as function approximators. So the motivation for function approximation and t..
-
[Lecture 11] (2/2) Off-Policy and Multi-Step LearningResearch/RL_DeepMind 2024. 8. 9. 01:04
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%2011-%20Off-policy%20and%20multi-step.pdfhttps://www.youtube.com/watch?v=u84MFu1nG4g&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=11We can extend this idea of using control variates in a multi-step case. In order to do that, we're going to consider a generic multi-step update. Here we're going to consider the lambda..
-
[Lecture 11] (1/2) Off-Policy and Multi-Step LearningResearch/RL_DeepMind 2024. 8. 8. 18:30
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%2011-%20Off-policy%20and%20multi-step.pdfhttps://www.youtube.com/watch?v=u84MFu1nG4g&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=11Why is this important? There's a number of reasons. First, what is Off-policy learning? Off-policy learning is learning about a different policy then, that is used to generate the data...
-
[Lecture 9] (2/2) Policy Gradients and Actor CriticsResearch/RL_DeepMind 2024. 8. 5. 01:07
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%209-%20Policy%20gradients%20and%20actor%20critics.pdfhttps://www.youtube.com/watch?v=y3oqOjHilio&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=9What is and Actor Critics? Actor Critic is an agent that has an actor - a policy but it also has a value estimate - a critic. And we're going to talk about some concrete reas..
-
[Lecture 9] (1/2) Policy Gradients and Actor CriticsResearch/RL_DeepMind 2024. 8. 4. 17:08
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%209-%20Policy%20gradients%20and%20actor%20critics.pdfhttps://www.youtube.com/watch?v=y3oqOjHilio&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=9One should not solve a more general problem as an intermediate step. If you are going to solve a more general problem, then this is going to almost necessarily be harder. It..