Research
-
Retrieval-Augmented Generation for Knowledge-Intensive NLP TasksResearch/NLP_Paper 2024. 7. 27. 18:12
https://arxiv.org/pdf/2005.11401AbstractLarge pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architecture..
-
Self-Consistency Improves Chain of Thought Reasoning in Language ModelsResearch/NLP_Paper 2024. 7. 27. 14:31
https://arxiv.org/pdf/2203.11171AbstractChain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the gre..
-
Prefix-Tuning: Optimizing Continuous Prompts for GenerationResearch/NLP_Paper 2024. 7. 27. 11:11
https://arxiv.org/pdf/2101.00190AbstractFine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps la..
-
[LoRA] Low-rank Adaptation of Large Language ModelsResearch/NLP_Paper 2024. 7. 27. 00:44
https://arxiv.org/pdf/2106.09685AbstractAn important paradigm of natural language processing consists of large-scale pretraining on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example – deploying independent instances of fine-tuned models, eac..
-
[AdapterFusion] Non-Destructive Task Composition for Transfer LearningResearch/NLP_Paper 2024. 7. 26. 17:40
https://arxiv.org/pdf/2005.00247AbstractSequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks; however, they suffer from catastrophic forgetting and difficulties in dataset balancing. To address these shortcomings, we propose AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks. First, in the knowl..
-
[Adapter] Parameter-Efficient Transfer Learning for NLPResearch/NLP_Paper 2024. 7. 26. 11:48
https://arxiv.org/pdf/1902.00751AbstractFine-tuning large pre-trained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we propose transfer with adapter modules. Adapter modules yield a compact and extensible model; they add only a few traina..
-
Large Language Models are Zero-Shot ReasonersResearch/NLP_Paper 2024. 7. 26. 00:22
https://arxiv.org/pdf/2205.11916Abstract Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique for eliciting complex multi-step reasoning through step-by-step answer examples, achieved the state-of..
-
[CoT] Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsResearch/NLP_Paper 2024. 7. 25. 15:24
https://arxiv.org/pdf/2201.11903AbstractWe explore how generating a chain of thought—a series of intermediate reasoning steps—significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain-of-thought prompting, where a few chain of..