밤에 쓰는 편지

Instability of Actor-Critic Algorithms diagnosed by DPO

Campus Life 2024. 12. 14. 11:52

Instability of Actor-Critic Algorithms diagnosed by DPO

Campus Life 2024. 12. 13. 23:02

논문에 제시된 수식보다 더 쉽고 상세하게 deriving

[PPO] Proximal Policy Optimization Algorithms

Research/..... 2024. 12. 13. 14:42

https://arxiv.org/pdf/1707.06347(Aug 2017)AbstractWe propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a “surrogate” objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel ob..

[DPG] Distributional Reinforcement Learning for Energy-Based Sequential Models

Research/... 2024. 12. 12. 00:12

https://arxiv.org/pdf/1912.08517AbstractGlobal Autoregressive Models (GAMs) are a recent proposal [15] for exploiting global properties of sequences for data-efficient learning of seq2seq models. In the first phase of training, an Energy-Based model (EBM) [10] over sequences is derived. This EBM has high representational power, but is unnormalized and cannot be directly exploited for sampling. T..

문득 든 생각인데.. GAN도 일종의 RL 아닌감?

Campus Life 2024. 12. 11. 13:49

GAN에서 generator가 생성한 image를 discriminator가 REAL/FAKE 판별하면 그 signal을 받아서 generator가 학습을 하잖아. 그러면 discriminator가 주는 signal을 일종의 reward로 보고, 이를 받아서 학습하는 generator를 agent라고 했을 때이 interaction loop은 일종의 reinforcement 아닌감? 나 이 생각에서 오류가 있다면 누가 좀 알려주세요..ㅎㅎ 돌아오지 않는 메아리.... ㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋ 나 누구한테 말하는 거니.. ㅎㅎㅎ ㅠㅠㅠㅠ

A Distributional Approach to Controlled Text Generation

Research/... 2024. 12. 9. 23:44

https://arxiv.org/pdf/2012.11635https://github.com/naver/gdcMay 2021 (ICLR 2021)AbstractWe propose a Distributional Approach for addressing Controlled Text Generation from pre-trained Language Models (LMs). This approach permits to specify, in a single formal framework, both “pointwise” and “distributional” constraints over the target LM — to our knowledge, the first model with such generality —..

[MaPLe] Multi-modal Prompt Learning

Research/NLP_YS2024 2024. 12. 5. 21:27

https://arxiv.org/pdf/2210.03117https://github.com/muzairkhattak/multimodal-prompt-learning(CVPR 2023)Abstract Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks. However, they are sensitive to the choice of input text prompts and require careful selection of prompt templates to perform well. Inspired by the Natural Language Proc..

[DPLCLIP] Domain Prompt Learning for Efficiently Adapting CLIP to Unseen Domains

Research/NLP_YS2024 2024. 12. 5. 15:28

https://arxiv.org/pdf/2111.12853v3https://github.com/shogi880/DPLCLIP?tab=readme-ov-fileAbstract Domain generalization (DG) is a difficult transfer learning problem aiming to learn a generalizable model for unseen domains. Recent foundation models (FMs) are robust to many distribution shifts and, therefore, should substantially improve the performance of DG. In this work, we study generic ways t..

ABOUT ME

밤에 쓰는 편지 밤에 쓰는 편지

티스토리툴바

ABOUT ME

전체 글

티스토리툴바