Research/RL_DeepMind
-
[Lecture 12] (1/2) Deep Reinforcement LearningResearch/RL_DeepMind 2024. 8. 9. 11:47
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%2012-%20Deep%20RL%201%20.pdfhttps://www.youtube.com/watch?v=cVzvNZOBaJ4&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=12Deep reinforcement learning is the combination of reinforcement learning algorithms with the use of deep neural networks as function approximators. So the motivation for function approximation and t..
-
[Lecture 11] (2/2) Off-Policy and Multi-Step LearningResearch/RL_DeepMind 2024. 8. 9. 01:04
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%2011-%20Off-policy%20and%20multi-step.pdfhttps://www.youtube.com/watch?v=u84MFu1nG4g&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=11We can extend this idea of using control variates in a multi-step case. In order to do that, we're going to consider a generic multi-step update. Here we're going to consider the lambda..
-
[Lecture 11] (1/2) Off-Policy and Multi-Step LearningResearch/RL_DeepMind 2024. 8. 8. 18:30
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%2011-%20Off-policy%20and%20multi-step.pdfhttps://www.youtube.com/watch?v=u84MFu1nG4g&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=11Why is this important? There's a number of reasons. First, what is Off-policy learning? Off-policy learning is learning about a different policy then, that is used to generate the data...
-
[Lecture 9] (2/2) Policy Gradients and Actor CriticsResearch/RL_DeepMind 2024. 8. 5. 01:07
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%209-%20Policy%20gradients%20and%20actor%20critics.pdfhttps://www.youtube.com/watch?v=y3oqOjHilio&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=9What is and Actor Critics? Actor Critic is an agent that has an actor - a policy but it also has a value estimate - a critic. And we're going to talk about some concrete reas..
-
[Lecture 9] (1/2) Policy Gradients and Actor CriticsResearch/RL_DeepMind 2024. 8. 4. 17:08
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%209-%20Policy%20gradients%20and%20actor%20critics.pdfhttps://www.youtube.com/watch?v=y3oqOjHilio&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=9One should not solve a more general problem as an intermediate step. If you are going to solve a more general problem, then this is going to almost necessarily be harder. It..
-
[Lecture 8] (2/2) Planning and ModelsResearch/RL_DeepMind 2024. 8. 4. 00:33
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%208%20-%20Model%20Based%20Reinforcement%20Learning.pdfhttps://www.youtube.com/watch?v=FKl8kM4finE&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=8Let's discuss concrete instantiation of Dyna, because this idea is very general but of course we could plug many different algorithms to this and apply very different update..
-
[Lecture 8] (1/2) Planning and ModelsResearch/RL_DeepMind 2024. 8. 3. 17:11
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%208%20-%20Model%20Based%20Reinforcement%20Learning.pdfhttps://www.youtube.com/watch?v=FKl8kM4finE&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=8If we look back at dynamic programming and model-free algorithms, we can roughly sketch the underlying principles and differences between the two in the following way. So i..
-
[Lecture 7] (2/2) Function ApproximationResearch/RL_DeepMind 2024. 8. 3. 09:55
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%207-%20Function%20approximation%20in%20reinforcement%20learning%20.pdfhttps://www.youtube.com/watch?v=ook46h2Jfb4&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=7Now we can't update towards the true value function if we don't have that yet, so instead we're going to substitute the targets. For Monte Carlo, we could p..