Research/NLP_Paper
-
[Project Proposal] Improving the performance of machine-generated text (MGT) detection by identifying the significance of individual tokensResearch/NLP_Paper 2024. 11. 11. 14:49
※ 수정 중..!!※ This is the project proposal for Team 5 in the 2024 NLP class.※ The main idea for this project was provided by D.H. Lee.※ The content of this proposal is based on discussions with our team members: S.J.Kim, D.H.Lee, S.J.Lee, S.Y.Park.※ The final proposal PPT will be created in collaboration with S.J.Lee.※ The paper review presentation will be given by S.J.Kim.※ The proposal & project..
-
Causal Interpretation of Self-Attention in Pre-Trained TransformersResearch/NLP_Paper 2024. 11. 11. 10:38
https://arxiv.org/pdf/2310.20307(Oct 2023, NeurIPS) ※ 2024 NLP class team project's subjectAbstractWe propose a causal interpretation of self-attention in the Transformer neural network architecture. We interpret self-attention as a mechanism that estimates a structural equation model for a given input sequence of symbols (tokens). The structural equation model can be interpreted, in turn, as a ..
-
Enhancing Machine-Generated Text Detection: Adversarial Fine-Tuning of Pre-Trained Language ModelsResearch/NLP_Paper 2024. 11. 10. 22:17
※ 2024 NLP class team project's research subjectAbstractAdvances in large language models (LLMs) have revolutionized the natural language processing field. However, the text generated by LLMs can result in various issues, such as fake news, misinformation, and social media spam. In addition, detecting machine-generated text is becoming increasingly difficult because it produces text that resembl..
-
SST: Multi-Scale Hybrid Mamba-Transformer Experts for Long-Short Range Time Series ForecastingResearch/NLP_Paper 2024. 9. 25. 00:13
https://arxiv.org/pdf/2404.14757 머리 속에 그리고 있던 이상적인 형태의 연구가 그대로 논문으로 실현되어서 정말 너무너무 놀랐다. 내가 생각하던 Mamba와 Attention의 특장점을 제대로 살리면서, 그리고 내가 고민하던 부분 - "어떻게 합칠 것인가" (hybrid 형태)에 대한 solution을 기발하게 잘 제시했다. 즉, mamba - SSM이 time series의 장기적으로 stationary한 형태를 포착해나가는 특성 / attention이 국지적인 pattern을 잡아내는 특성을 제대로 조합했다. 그런데 여기서 "hybrid (조합)" 한다는 게 말은 멋있지만, 사실 구체적으로 방법을 생각하면 쉽지 않은데, 대다수의 SSM과 Transformer를 결합한 연구..
-
[InstructGPT, RLHF] Training Language Models to Follow Instructions with Human FeedbackResearch/NLP_Paper 2024. 8. 12. 01:40
https://arxiv.org/pdf/2203.02155AbstractMaking language models bigger does not inherently make them better at following a user’s intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent ..
-
Generating Long Sequences with Sparse TransformersResearch/NLP_Paper 2024. 7. 28. 01:19
https://arxiv.org/pdf/1904.10509AbstractTransformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length. In this paper we introduce sparse factorizations of the attention matrix which reduce this to O(n √ n). We also introduce a) a variation on architecture and initialization to train deeper networks, b) the recomputation of attention matr..
-
Retrieval-Augmented Generation for Knowledge-Intensive NLP TasksResearch/NLP_Paper 2024. 7. 27. 18:12
https://arxiv.org/pdf/2005.11401AbstractLarge pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architecture..
-
Self-Consistency Improves Chain of Thought Reasoning in Language ModelsResearch/NLP_Paper 2024. 7. 27. 14:31
https://arxiv.org/pdf/2203.11171AbstractChain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the gre..