Research
-
[GPT-3] (3/3) Language Models are Few-Shot LearnersResearch/NLP_Paper 2024. 7. 21. 15:58
https://arxiv.org/pdf/2005.141654. Measuring and Preventing Memorization Of Benchmarks Since our training dataset is sourced from the internet, it is possible that our model was trained on some of our benchmark test sets. Accurately detecting test contamination from internet-scale datasets is a new area of research without established best practices. While it is common practice to train large mo..
-
[GPT-3] (2/3) Language Models are Few-Shot LearnersResearch/NLP_Paper 2024. 7. 21. 12:35
https://arxiv.org/pdf/2005.141653. ResultsIn Figure 3.1 we display training curves for the 8 models described in Section 2. For this graph we also include 6 additional extra-small models with as few as 100,000 parameters. As observed in [KMH+20], language modeling performance follows a power-law when making efficient use of training compute. After extending this trend by two more orders of magni..
-
[GPT-3] (1/3) Language Models are Few-Shot LearnersResearch/NLP_Paper 2024. 7. 21. 11:16
https://arxiv.org/pdf/2005.14165AbstractRecent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally pe..
-
[GPT-2] Language Models are Unsupervised Multitask LearnersResearch/NLP_Paper 2024. 7. 20. 22:06
https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdfAbstract Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on tasks-pecific datasets. We demonstrate that language models begin to learn these tasks without any explicit s..
-
[BERT] Pre-training of Deep Bidirectional Transformers for Language UnderstandingResearch/NLP_Paper 2024. 7. 20. 13:51
https://arxiv.org/pdf/1810.04805AbstractWe introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right ..
-
[GPT] Improving Language Understanding by Generative Pre-TrainingResearch/NLP_Paper 2024. 7. 20. 09:06
https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdfAbstractNatural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making ..
-
[Transformer] Attention Is All You NeedResearch/NLP_Paper 2024. 7. 19. 09:10
※ https://arxiv.org/pdf/1706.03762AbstractThe dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing wit..
-
Complex ReasoningResearch/NLP_CMU 2024. 7. 11. 07:20
※ Summaries after taking 「Advanced NLP - Carnegie Mellon University」 coursehttps://www.youtube.com/watch?v=mPd2hFmzjWE&list=PL8PYTP1V4I8DZprnWryM4nR8IZl1ZXDjg&index=19https://phontron.com/class/anlp2024/assets/slides/anlp-21-reasoning.pdfWhat is reasoning? The basic idea is using evidence and logic to arrive at conclusions and make judgements. From the philosophical standpoint, there are two var..