Research
-
Aligning Language Models with Preferences through f-divergence MinimizationResearch/... 2024. 12. 23. 07:13
https://arxiv.org/pdf/2302.08215(Jun 2023 ICML 2023) 짜잔~ 종합선물세트 입니다~!f-divergence로 대동단결.옛다. 크리스마스 선물 받아랏~! 너무 당연한 이야기이지만, 목표를 어떻게 설정하느냐에 따라 인생은 완전히 달라진다.목적지 변경은 우리를 완전히 다른 방향으로 이끌기도 한다.objective function (loss function)에 따라서 model의 behavior가 달라지는 건 참 흥미진진하다.(좀 더 구체적으로는..- 어떠한 measure로 target distribution을 approximate할 것인가- metric에 따라 convergence하는 양상은 어떻게 달라지는가) 이 논문을 읽는데 왜케 행복하지? ㅜㅜ 이 논문을 읽기 위해..
-
[cdpg] Controlling Conditional Language Models without Catastrophic ForgettingResearch/... 2024. 12. 22. 08:49
https://arxiv.org/pdf/2112.00791(Jun 2022 ICML 2022)AbstractMachine learning is shifting towards general-purpose pretrained generative models, trained in a self-supervised manner on large amounts of data, which can then be applied to solve a large number of tasks. However, due to their generic training methodology, these models often fail to meet some of the downstream requirements (e.g., halluc..
-
(3/3) GAN, F-Divergence, IPMResearch/... 2024. 12. 22. 00:34
1. Example of Parallel Line Density2. Wasserstein Distance with GAN3. Kantarovich-Rubinstein Duality4. Wasserstein as Primal LP5. Wasserstein as Dual LP6. Property of Dual LP on Wasserstein Distance7. Lipschitz Continuity8. Dual Problem of Wasserstein Distance9. Kantorovich-Rubinstein Duality & Wasserstein GAN
-
(1/3) GAN, F-Divergence, IPMResearch/... 2024. 12. 20. 07:37
https://www.youtube.com/playlist?list=PLzZ7PPT4KK5oQ4io5Fead9j_ksLIokrri1. Loss Function of GAN2. Jensen Shannon Divergence3. Training of GAN4. Theoretical Results of GAN5. Mode Collapse6. Contitional Generative Adversarial Network7. Adding Latent Variable to GAN8. InfoGAN9. Comparison between Conditional GAN & InfoGAN
-
High Variance in Policy gradientsResearch/RL_reference 2024. 12. 19. 08:51
https://balajiai.github.io/high_variance_in_policy_gradientshttps://github.com/BalajiAI/High-Variance-in-Policy-gradients 1. baseline에 대한 엄밀한 도출 2. GAE (Generalized Advantage Estimation)- code를 보다보면, 효율적인 연산을 위한 technique이 들어가거나, 선행연구의 code를 base로 해서, 추가적인 공부가 필요한 경우가 많은데, PPO algorithm에서 advantage 연산을 GAE (Generalized Advantage Estimation)로 하는데 이를 이해하기 위한 보충 자료. (공부했던 건데 제대로 이해 못하고 넘어갔던..)Tho..