'분류 전체보기' 카테고리의 글 목록 (19 Page)

[Lecture 9] (1/2) Policy Gradients and Actor Critics

Research/RL_DeepMind 2024. 8. 4. 17:08

https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%209-%20Policy%20gradients%20and%20actor%20critics.pdfhttps://www.youtube.com/watch?v=y3oqOjHilio&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=9One should not solve a more general problem as an intermediate step. If you are going to solve a more general problem, then this is going to almost necessarily be harder. It..

[Lecture 8] (2/2) Planning and Models

Research/RL_DeepMind 2024. 8. 4. 00:33

https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%208%20-%20Model%20Based%20Reinforcement%20Learning.pdfhttps://www.youtube.com/watch?v=FKl8kM4finE&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=8Let's discuss concrete instantiation of Dyna, because this idea is very general but of course we could plug many different algorithms to this and apply very different update..

[Lecture 8] (1/2) Planning and Models

Research/RL_DeepMind 2024. 8. 3. 17:11

https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%208%20-%20Model%20Based%20Reinforcement%20Learning.pdfhttps://www.youtube.com/watch?v=FKl8kM4finE&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=8If we look back at dynamic programming and model-free algorithms, we can roughly sketch the underlying principles and differences between the two in the following way. So i..

[Lecture 7] (2/2) Function Approximation

Research/RL_DeepMind 2024. 8. 3. 09:55

https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%207-%20Function%20approximation%20in%20reinforcement%20learning%20.pdfhttps://www.youtube.com/watch?v=ook46h2Jfb4&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=7Now we can't update towards the true value function if we don't have that yet, so instead we're going to substitute the targets. For Monte Carlo, we could p..

[Lecture 7] (1/2) Function Approximation

Research/RL_DeepMind 2024. 8. 3. 00:09

https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%207-%20Function%20approximation%20in%20reinforcement%20learning%20.pdfhttps://www.youtube.com/watch?v=ook46h2Jfb4&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=7The policy, value function, model, and agent state update, all of these things that are inside the agent can be viewed as being functions. For instance, a po..

[Lecture 6] (2/2) Model-Free Control

Research/RL_DeepMind 2024. 8. 2. 23:05

https://www.youtube.com/watch?v=t9uf9cuogBo&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=6&t=1068sSo that's one way to go. Monte Carlo learning and Temporal Difference learning. And in both cases what we're doing is, we're interleaving policy evaluation and policy improvement step. Now we're going to turn to a new topic which is Off-policy learning which is about learning about a policy differ..

[Lecture 6] (1/2) Model-Free Control

Research/RL_DeepMind 2024. 7. 31. 00:02

https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%206%20-%20Model-free%20control.pdfhttps://www.youtube.com/watch?v=t9uf9cuogBo&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=6Policy iteration refers to interleaving two separate steps which are called policy evaluation and policy improvement. We start with some arbitrary initial value function for instance they coul..

[Lecture 5] Model-Free Prediction

Research/RL_DeepMind 2024. 7. 30. 15:54

https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%205%20-%20ModelFreePrediction.pdfhttps://www.youtube.com/watch?v=eaWfWoVUTEw&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=5In general, in reinforcement learning when people say Monte Carlo, they typically mean sample complete episodes, an episode is a trajectory of experience which has some sort of a natural ending ..

ABOUT ME

밤에 쓰는 편지 밤에 쓰는 편지

티스토리툴바