전체 글
-
[DPO] Direct Preference Optimization: Your Language Model is Secretly a Reward ModelResearch/..... 2024. 12. 14. 12:08
https://arxiv.org/pdf/2305.18290May 2023 (NeurIPS 2023)AbstractWhile large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining such steerability collect human labels of the relative quality of model generati..
-
-
[PPO] Proximal Policy Optimization AlgorithmsResearch/..... 2024. 12. 13. 14:42
https://arxiv.org/pdf/1707.06347(Aug 2017)AbstractWe propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a “surrogate” objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel ob..
-
[DPG] Distributional Reinforcement Learning for Energy-Based Sequential ModelsResearch/... 2024. 12. 12. 00:12
https://arxiv.org/pdf/1912.08517AbstractGlobal Autoregressive Models (GAMs) are a recent proposal [15] for exploiting global properties of sequences for data-efficient learning of seq2seq models. In the first phase of training, an Energy-Based model (EBM) [10] over sequences is derived. This EBM has high representational power, but is unnormalized and cannot be directly exploited for sampling. T..
-
문득 든 생각인데.. GAN도 일종의 RL 아닌감?Campus Life 2024. 12. 11. 13:49
GAN에서 generator가 생성한 image를 discriminator가 REAL/FAKE 판별하면 그 signal을 받아서 generator가 학습을 하잖아. 그러면 discriminator가 주는 signal을 일종의 reward로 보고, 이를 받아서 학습하는 generator를 agent라고 했을 때이 interaction loop은 일종의 reinforcement 아닌감? 나 이 생각에서 오류가 있다면 누가 좀 알려주세요..ㅎㅎ 돌아오지 않는 메아리.... ㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋ 나 누구한테 말하는 거니.. ㅎㅎㅎ ㅠㅠㅠㅠ
-
A Distributional Approach to Controlled Text GenerationResearch/... 2024. 12. 9. 23:44
https://arxiv.org/pdf/2012.11635https://github.com/naver/gdcMay 2021 (ICLR 2021)AbstractWe propose a Distributional Approach for addressing Controlled Text Generation from pre-trained Language Models (LMs). This approach permits to specify, in a single formal framework, both “pointwise” and “distributional” constraints over the target LM — to our knowledge, the first model with such generality —..
-
[MaPLe] Multi-modal Prompt LearningResearch/NLP_YS2024 2024. 12. 5. 21:27
https://arxiv.org/pdf/2210.03117https://github.com/muzairkhattak/multimodal-prompt-learning(CVPR 2023)Abstract Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks. However, they are sensitive to the choice of input text prompts and require careful selection of prompt templates to perform well. Inspired by the Natural Language Proc..