분류 전체보기
-
-
[PPO] Proximal Policy Optimization AlgorithmsResearch/..... 2024. 12. 13. 14:42
https://arxiv.org/pdf/1707.06347(Aug 2017)AbstractWe propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a “surrogate” objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel ob..
-
[DPG] Distributional Reinforcement Learning for Energy-Based Sequential ModelsResearch/... 2024. 12. 12. 00:12
https://arxiv.org/pdf/1912.08517AbstractGlobal Autoregressive Models (GAMs) are a recent proposal [15] for exploiting global properties of sequences for data-efficient learning of seq2seq models. In the first phase of training, an Energy-Based model (EBM) [10] over sequences is derived. This EBM has high representational power, but is unnormalized and cannot be directly exploited for sampling. T..
-
문득 든 생각인데.. GAN도 일종의 RL 아닌감?Campus Life 2024. 12. 11. 13:49
GAN에서 generator가 생성한 image를 discriminator가 REAL/FAKE 판별하면 그 signal을 받아서 generator가 학습을 하잖아. 그러면 discriminator가 주는 signal을 일종의 reward로 보고, 이를 받아서 학습하는 generator를 agent라고 했을 때이 interaction loop은 일종의 reinforcement 아닌감? 나 이 생각에서 오류가 있다면 누가 좀 알려주세요..ㅎㅎ 돌아오지 않는 메아리.... ㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋ 나 누구한테 말하는 거니.. ㅎㅎㅎ ㅠㅠㅠㅠ
-
A Distributional Approach to Controlled Text GenerationResearch/... 2024. 12. 9. 23:44
https://arxiv.org/pdf/2012.11635https://github.com/naver/gdcMay 2021 (ICLR 2021)AbstractWe propose a Distributional Approach for addressing Controlled Text Generation from pre-trained Language Models (LMs). This approach permits to specify, in a single formal framework, both “pointwise” and “distributional” constraints over the target LM — to our knowledge, the first model with such generality —..
-
[MaPLe] Multi-modal Prompt LearningResearch/NLP_YS2024 2024. 12. 5. 21:27
https://arxiv.org/pdf/2210.03117https://github.com/muzairkhattak/multimodal-prompt-learning(CVPR 2023)Abstract Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks. However, they are sensitive to the choice of input text prompts and require careful selection of prompt templates to perform well. Inspired by the Natural Language Proc..
-
[DPLCLIP] Domain Prompt Learning for Efficiently Adapting CLIP to Unseen DomainsResearch/NLP_YS2024 2024. 12. 5. 15:28
https://arxiv.org/pdf/2111.12853v3https://github.com/shogi880/DPLCLIP?tab=readme-ov-fileAbstract Domain generalization (DG) is a difficult transfer learning problem aiming to learn a generalizable model for unseen domains. Recent foundation models (FMs) are robust to many distribution shifts and, therefore, should substantially improve the performance of DG. In this work, we study generic ways t..
-
가끔 논문 보며 웃는다Campus Life 2024. 12. 5. 10:56
Alas, the capricious behaviour of machine learning systems out-of-distribution is a roadblock to their deployment in critical applications. 논문에 가끔 생각지 못한 표현이 등장할 때가 있다. 웃음 포인트 아닌데, formal하지 않아서 웃음 나와. ㅎㅎ 아.. 근데 이 논문은 너무 근원적인 질문으로 나를 힘들게 한다.. 이건 최근 내가 가지고 있던 의문과도 상통하는 부분이다. 요새 든 생각은, MLLM이나 VLM이 정말 그 semantic을 이해하고 있는 건지, 아니면 단순히 training을 때려부어서, 학습된 결과가 출력되고 있는 건지 의문이었거든. 정말 understanding 능력을 갖..