분류 전체보기
-
[GPT-3] (1/3) Language Models are Few-Shot LearnersResearch/NLP_Paper 2024. 7. 21. 11:16
https://arxiv.org/pdf/2005.14165AbstractRecent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally pe..
-
[GPT-2] Language Models are Unsupervised Multitask LearnersResearch/NLP_Paper 2024. 7. 20. 22:06
https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdfAbstract Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on tasks-pecific datasets. We demonstrate that language models begin to learn these tasks without any explicit s..
-
[BERT] Pre-training of Deep Bidirectional Transformers for Language UnderstandingResearch/NLP_Paper 2024. 7. 20. 13:51
https://arxiv.org/pdf/1810.04805AbstractWe introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right ..
-
[GPT] Improving Language Understanding by Generative Pre-TrainingResearch/NLP_Paper 2024. 7. 20. 09:06
https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdfAbstractNatural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making ..
-
[Transformer] Attention Is All You NeedResearch/NLP_Paper 2024. 7. 19. 09:10
※ https://arxiv.org/pdf/1706.03762AbstractThe dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing wit..
-
Complex ReasoningResearch/NLP_CMU 2024. 7. 11. 07:20
※ Summaries after taking 「Advanced NLP - Carnegie Mellon University」 coursehttps://www.youtube.com/watch?v=mPd2hFmzjWE&list=PL8PYTP1V4I8DZprnWryM4nR8IZl1ZXDjg&index=19https://phontron.com/class/anlp2024/assets/slides/anlp-21-reasoning.pdfWhat is reasoning? The basic idea is using evidence and logic to arrive at conclusions and make judgements. From the philosophical standpoint, there are two var..
-
Code GenerationResearch/NLP_CMU 2024. 7. 10. 13:04
※ Summaries after taking 「Advanced NLP - Carnegie Mellon University」 coursehttps://www.youtube.com/watch?v=bN2ZZieBXsE&list=PL8PYTP1V4I8DZprnWryM4nR8IZl1ZXDjg&index=16https://phontron.com/class/anlp2024/assets/slides/anlp-17-codegen.pdfI'm going to be talking about code generation and this is a research topic that I've worked on for a long time, now I like a lot it's become very useful nowadays ..
-
Large Language ModelsResearch/NLP_CMU 2024. 7. 9. 15:20
※ Summaries after taking 「Advanced NLP - Carnegie Mellon University」 coursehttps://www.youtube.com/watch?v=2rOSrDtg7HQhttps://phontron.com/class/anlp2024/assets/slides/anlp-15-tourofllms.pdfI'll be talking about a tour of modern LLMs and the idea here is that, there is many many large language models available nowadays but I wanted to go through some of the ones that are particularly interesting..