Research
-
(3/3) GAN, F-Divergence, IPMResearch/... 2024. 12. 22. 00:34
1. Example of Parallel Line Density2. Wasserstein Distance with GAN3. Kantarovich-Rubinstein Duality4. Wasserstein as Primal LP5. Wasserstein as Dual LP6. Property of Dual LP on Wasserstein Distance7. Lipschitz Continuity8. Dual Problem of Wasserstein Distance9. Kantorovich-Rubinstein Duality & Wasserstein GAN
-
(1/3) GAN, F-Divergence, IPMResearch/... 2024. 12. 20. 07:37
https://www.youtube.com/playlist?list=PLzZ7PPT4KK5oQ4io5Fead9j_ksLIokrri1. Loss Function of GAN2. Jensen Shannon Divergence3. Training of GAN4. Theoretical Results of GAN5. Mode Collapse6. Contitional Generative Adversarial Network7. Adding Latent Variable to GAN8. InfoGAN9. Comparison between Conditional GAN & InfoGAN
-
High Variance in Policy gradientsResearch/RL_reference 2024. 12. 19. 08:51
https://balajiai.github.io/high_variance_in_policy_gradientshttps://github.com/BalajiAI/High-Variance-in-Policy-gradients 1. baseline에 대한 엄밀한 도출 2. GAE (Generalized Advantage Estimation)- code를 보다보면, 효율적인 연산을 위한 technique이 들어가거나, 선행연구의 code를 base로 해서, 추가적인 공부가 필요한 경우가 많은데, PPO algorithm에서 advantage 연산을 GAE (Generalized Advantage Estimation)로 하는데 이를 이해하기 위한 보충 자료. (공부했던 건데 제대로 이해 못하고 넘어갔던..)Tho..
-
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic ForgettingResearch/... 2024. 12. 17. 09:30
https://arxiv.org/pdf/2206.00761https://github.com/naver/gdc(Nov 2022 NeurIPS 2022)AbstractThe availability of large pre-trained models is changing the landscape of Machine Learning research and practice, moving from a “training from scratch” to a “finetuning” paradigm. While in some applications the goal is to “nudge” the pre-trained distribution towards preferred outputs, in others it is to st..