Research
-
[Lecture 6] (2/2) Model-Free ControlResearch/RL_DeepMind 2024. 8. 2. 23:05
https://www.youtube.com/watch?v=t9uf9cuogBo&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=6&t=1068sSo that's one way to go. Monte Carlo learning and Temporal Difference learning. And in both cases what we're doing is, we're interleaving policy evaluation and policy improvement step. Now we're going to turn to a new topic which is Off-policy learning which is about learning about a policy differ..
-
[Lecture 6] (1/2) Model-Free ControlResearch/RL_DeepMind 2024. 7. 31. 00:02
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%206%20-%20Model-free%20control.pdfhttps://www.youtube.com/watch?v=t9uf9cuogBo&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=6Policy iteration refers to interleaving two separate steps which are called policy evaluation and policy improvement. We start with some arbitrary initial value function for instance they coul..
-
[Lecture 5] Model-Free PredictionResearch/RL_DeepMind 2024. 7. 30. 15:54
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%205%20-%20ModelFreePrediction.pdfhttps://www.youtube.com/watch?v=eaWfWoVUTEw&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=5In general, in reinforcement learning when people say Monte Carlo, they typically mean sample complete episodes, an episode is a trajectory of experience which has some sort of a natural ending ..
-
[Lecture 4] Theoretical Fundamentals of Dynamic ProgrammingResearch/RL_DeepMind 2024. 7. 30. 14:47
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%204%20-%20Theoretical%20Fundamentals%20of%20DP%20Algorithms.pdfhttps://www.youtube.com/watch?v=XpbLq7rIJAA&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=4For any two points in this vector space, if you apply this contraction mapping to this point, the distance between two points shrinks at least by alpha. For any seq..
-
[Lecture 3] Markov Decision Processes and Dynamic ProgrammingResearch/RL_DeepMind 2024. 7. 29. 18:11
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%203%20-%20MDPs%20and%20Dynamic%20Programming.pdfhttps://www.youtube.com/watch?v=zSOMeug_i_M&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=3 Anything that has happened in this interaction process is summarized in the current state. In terms of its information about the future transition is summarized in the current st..
-
[Lecture 2] Exploration and ExploitationResearch/RL_DeepMind 2024. 7. 29. 17:58
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%202-%20Exploration%20and%20control_slides.pdfhttps://www.youtube.com/watch?v=aQJP3Z2Ho8U&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=2One interesting property that we're going to talk about a lot is that it's an active learning setting. This means that these actions they don't just change the reward. They don't jus..
-
[Lecture 1] Introduction - Reinforcement LearningResearch/RL_DeepMind 2024. 7. 29. 17:57
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%201%20-%20introduction.pdfhttps://www.youtube.com/watch?v=TCCjZe0y4Qc&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=1 http://incompleteideas.net/book/the-book-2nd.htmlYou are subjected to some data or experience, but the experience is not fully out of your control. The actions that you take might influence the experi..
-
Generating Long Sequences with Sparse TransformersResearch/NLP_Paper 2024. 7. 28. 01:19
https://arxiv.org/pdf/1904.10509AbstractTransformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length. In this paper we introduce sparse factorizations of the attention matrix which reduce this to O(n √ n). We also introduce a) a variation on architecture and initialization to train deeper networks, b) the recomputation of attention matr..