분류 전체보기
-
Contrastive LearningResearch/Multimodal 2024. 8. 14. 14:03
https://neurips.cc/media/neurips-2021/Slides/21895.pdfhttps://sthalles.github.io/simple-self-supervised-learning/https://sanghyu.tistory.com/184What is Self-Supervised Learning?Self-Supervised Learning (SSL) is a special type of representation learning that enables learning good data representation from unlabelled dataset. It is motivated by the idea of constructing supervised learning tasks out..
-
[CLIP] Connecting text and imagesResearch/Multimodal 2024. 8. 13. 22:01
https://openai.com/index/clip/We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the “zero-shot” capabilities of GPT-2 and GPT-3. Although deep learning has revolutionized computer vi..
-
[InstructGPT, RLHF] Training Language Models to Follow Instructions with Human FeedbackResearch/NLP_Paper 2024. 8. 12. 01:40
https://arxiv.org/pdf/2203.02155AbstractMaking language models bigger does not inherently make them better at following a user’s intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent ..
-
PPO & RLHF & DPOResearch/RL_DeepMind 2024. 8. 11. 16:11
https://www.youtube.com/watch?v=SgC6AZss478&list=PLs8w1Cdi-zvYviYYw_V3qe6SINReGF5M-&index=1https://www.youtube.com/watch?v=TjHH_--7l8g&list=PLs8w1Cdi-zvYviYYw_V3qe6SINReGF5M-&index=2https://www.youtube.com/watch?v=Z_JUqJBpVOk&list=PLs8w1Cdi-zvYviYYw_V3qe6SINReGF5M-&index=3https://www.youtube.com/watch?v=k2pD3k1485A&list=PLs8w1Cdi-zvYviYYw_V3qe6SINReGF5M-&index=4 The idea is that, we have a Trans..
-
[6/6] Policy Gradient MethodsResearch/RL_DeepMind 2024. 8. 10. 17:23
https://www.youtube.com/watch?v=e20EY4tFC_Q&list=PLzvYlJMoZ02Dxtwe-MmH4nOB5jYlMGBjr&index=6Policy gradient methods take a more direct approach to the problem statement of RL and as a result, many of the most effective models are from this category. For example, Proxmal Policy Optimization is a type of policy gradient method, and that's OpenAI's go to RL algorithm. In fact, that's what they use t..