*NLP/extra
-
#1 Summaries on Efficient Attentions*NLP/extra 2024. 9. 14. 02:09
(1st version - Sep 14, 2024 @Soyoung, Park) IntroBefore dive into the content, I'd like to mention a few things. First, this is my first attempt to summarize various researches on making Transformer efficient. So it's really hard to cover exhaustively all of the great works that makes transformers efficient. These works can be categorized in many different ways and moreover, broadly, we can ev..
-
Simplifying S4*NLP/extra 2024. 9. 12. 22:26
https://hazyresearch.stanford.edu/blog/2022-06-11-simplifying-s4One goal of deep learning research is to find the simplest architectures that lead to the amazing results that we've seen for the last few years. In that spirit, we discuss the recent S4 architecture, which we think is simple—the structured state space model at its heart has been the most basic building block for generations of elec..
-
Mamba: The Easy Way*NLP/extra 2024. 9. 12. 15:34
https://jackcook.com/2024/02/23/mamba.html#fn1Today, basically any language model you can name is a Transformer model. OpenAI’s ChatGPT, Google’s Gemini, and GitHub’s Copilot are all powered by Transformers, to name a few. However, Transformers suffer from a fundamental flaw: they are powered by Attention, which scales quadratically with sequence length. Simply put, for quick exchanges (asking C..
-
Brief Summary of Prior Studies before Mamba 2*NLP/extra 2024. 9. 9. 18:21
※ (1st version - Sep 10, 2024 @Soyoung, Park)This is my first summary of Mamba related works. So it's not complete, and written without clear distinction between my writing and the content cited in the paper. ※ I'm planning to work on it again later. 1. Long Range Arena: A Benchmark for Efficient Transformers (2020)https://arxiv.org/pdf/2011.04006 "Transformers do not scale very well to long seq..
-
The Annotated S4*NLP/extra 2024. 9. 9. 14:18
Efficiently Modeling Long Sequences with Structured State Spaceshttps://srush.github.io/annotated-s4/Blog Post and Library by Sasha Rush and Sidd Karamcheti, v3 The Structured State Space for Sequence Modeling (S4) architecture is a new approach to very long-range sequence modeling tasks for vision, language, and audio, showing a capacity to capture dependencies over tens of thousands of steps. ..
-
MAMBA and State Space Models Explained*NLP/extra 2024. 6. 1. 13:53
https://athekunal.medium.com/mamba-and-state-space-models-explained-b1bf3cb3bb77This article will go through a new class of deep learning models called Structured State Spaces and Mamba. 1. Transformer and RNN plus their issues and positive sides2. S4 models and architecture details3. Mamba architectureRNNBefore transformers were introduced, we did sequence modelling using recurrent neural netwo..
-
A Visual Guide to Mamba and State Space Models*NLP/extra 2024. 5. 30. 17:32
https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-stateAn Alternative to Transformers for Language Modeling The Transformer architecture has been a major component in the success of Large Language Models (LLMs). It has been used for nearly all LLMs that are being used today, from open-source models like Mistral to closed-source models like ChatGPT. To further improve LLMs,..