전체 글
-
[Lecture 3] Markov Decision Processes and Dynamic ProgrammingResearch/RL_DeepMind 2024. 7. 29. 18:11
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%203%20-%20MDPs%20and%20Dynamic%20Programming.pdfhttps://www.youtube.com/watch?v=zSOMeug_i_M&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=3 Anything that has happened in this interaction process is summarized in the current state. In terms of its information about the future transition is summarized in the current st..
-
[Lecture 2] Exploration and ExploitationResearch/RL_DeepMind 2024. 7. 29. 17:58
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%202-%20Exploration%20and%20control_slides.pdfhttps://www.youtube.com/watch?v=aQJP3Z2Ho8U&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=2One interesting property that we're going to talk about a lot is that it's an active learning setting. This means that these actions they don't just change the reward. They don't jus..
-
[Lecture 1] Introduction - Reinforcement LearningResearch/RL_DeepMind 2024. 7. 29. 17:57
https://storage.googleapis.com/deepmind-media/UCL%20x%20DeepMind%202021/Lecture%201%20-%20introduction.pdfhttps://www.youtube.com/watch?v=TCCjZe0y4Qc&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=1 http://incompleteideas.net/book/the-book-2nd.htmlYou are subjected to some data or experience, but the experience is not fully out of your control. The actions that you take might influence the experi..
-
Generating Long Sequences with Sparse TransformersResearch/NLP_Paper 2024. 7. 28. 01:19
https://arxiv.org/pdf/1904.10509AbstractTransformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length. In this paper we introduce sparse factorizations of the attention matrix which reduce this to O(n √ n). We also introduce a) a variation on architecture and initialization to train deeper networks, b) the recomputation of attention matr..
-
Retrieval-Augmented Generation for Knowledge-Intensive NLP TasksResearch/NLP_Paper 2024. 7. 27. 18:12
https://arxiv.org/pdf/2005.11401AbstractLarge pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architecture..
-
Self-Consistency Improves Chain of Thought Reasoning in Language ModelsResearch/NLP_Paper 2024. 7. 27. 14:31
https://arxiv.org/pdf/2203.11171AbstractChain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the gre..
-
Prefix-Tuning: Optimizing Continuous Prompts for GenerationResearch/NLP_Paper 2024. 7. 27. 11:11
https://arxiv.org/pdf/2101.00190AbstractFine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps la..
-
[LoRA] Low-rank Adaptation of Large Language ModelsResearch/NLP_Paper 2024. 7. 27. 00:44
https://arxiv.org/pdf/2106.09685AbstractAn important paradigm of natural language processing consists of large-scale pretraining on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example – deploying independent instances of fine-tuned models, eac..