Research/RL_DeepMind
-
PPO & RLHF & DPOResearch/RL_DeepMind 2024. 8. 11. 16:11
https://www.youtube.com/watch?v=SgC6AZss478&list=PLs8w1Cdi-zvYviYYw_V3qe6SINReGF5M-&index=1https://www.youtube.com/watch?v=TjHH_--7l8g&list=PLs8w1Cdi-zvYviYYw_V3qe6SINReGF5M-&index=2https://www.youtube.com/watch?v=Z_JUqJBpVOk&list=PLs8w1Cdi-zvYviYYw_V3qe6SINReGF5M-&index=3https://www.youtube.com/watch?v=k2pD3k1485A&list=PLs8w1Cdi-zvYviYYw_V3qe6SINReGF5M-&index=4 The idea is that, we have a Trans..
-
[6/6] Policy Gradient MethodsResearch/RL_DeepMind 2024. 8. 10. 17:23
https://www.youtube.com/watch?v=e20EY4tFC_Q&list=PLzvYlJMoZ02Dxtwe-MmH4nOB5jYlMGBjr&index=6Policy gradient methods take a more direct approach to the problem statement of RL and as a result, many of the most effective models are from this category. For example, Proxmal Policy Optimization is a type of policy gradient method, and that's OpenAI's go to RL algorithm. In fact, that's what they use t..
-
[Lecture 12] (2/2) Deep Reinforcement LearningResearch/RL_DeepMind 2024. 8. 9. 16:18
https://www.youtube.com/watch?v=cVzvNZOBaJ4&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=12https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%2012-%20Deep%20RL%201%20.pdf In this section, I want to give you some insight in what happens when ideas from reinforcement learing are combined with deep learning, both in terms of how known RL issues manifest when using deep..