*RL/RL
-
Miscellaneous (already overloaded list, with further updates to come)*RL/RL 2025. 8. 23. 12:33
* Google Researchhttps://research.google/blog/?search=reinforcement& Latest News from Google Research Blog - Google Research research.google * Open AI Spinning up https://spinningup.openai.com/en/latest/ Welcome to Spinning Up in Deep RL! — Spinning Up documentation© Copyright 2018, OpenAI. Revision 038665d6.spinningup.openai.com * Policy Gradient Algorithmshttps://lilianweng.github.io/posts/20..
-
Lectures*RL/RL 2025. 8. 21. 22:31
https://www.youtube.com/playlist?list=PLzvYlJMoZ02Dxtwe-MmH4nOB5jYlMGBjr Reinforcement Learning By the Book www.youtube.com https://www.youtube.com/playlist?list=PLwRJQ4m4UJjNymuBM9RdmB3Z9N5-0IlY0 Foundations of Deep RL -- 6-lecture series by Pieter Abbeel www.youtube.com * DeepMind 와 Pieter Abbeel 교수님의 lecture 강추. RL을 공부한다면 필수 시청 * Mutual Information youtube series는 simulation을 시각적으로 정말 잘 보여줌...
-
-
Aligning Language Models with Preferences through f-divergence Minimization*RL/RL 2024. 12. 23. 07:13
https://arxiv.org/pdf/2302.08215(Jun 2023 ICML 2023) 짜잔~ 종합선물세트 입니다~!f-divergence로 대동단결.옛다. 크리스마스 선물 받아랏~! 너무 당연한 이야기이지만, 목표를 어떻게 설정하느냐에 따라 인생은 완전히 달라진다.목적지 변경은 우리를 완전히 다른 방향으로 이끌기도 한다.objective function (loss function)에 따라서 model의 behavior가 달라지는 건 참 흥미진진하다.(좀 더 구체적으로는..- 어떠한 measure로 target distribution을 approximate할 것인가- metric에 따라 convergence하는 양상은 어떻게 달라지는가) 이 논문을 읽는데 왜케 행복하지? ㅜㅜ 이 논문을 읽기 위해..
-
[cdpg] Controlling Conditional Language Models without Catastrophic Forgetting*RL/RL 2024. 12. 22. 08:49
https://arxiv.org/pdf/2112.00791(Jun 2022 ICML 2022)AbstractMachine learning is shifting towards general-purpose pretrained generative models, trained in a self-supervised manner on large amounts of data, which can then be applied to solve a large number of tasks. However, due to their generic training methodology, these models often fail to meet some of the downstream requirements (e.g., halluc..
-
(3/3) GAN, F-Divergence, IPM*RL/RL 2024. 12. 22. 00:34
1. Example of Parallel Line Density2. Wasserstein Distance with GAN3. Kantarovich-Rubinstein Duality4. Wasserstein as Primal LP5. Wasserstein as Dual LP6. Property of Dual LP on Wasserstein Distance7. Lipschitz Continuity8. Dual Problem of Wasserstein Distance9. Kantorovich-Rubinstein Duality & Wasserstein GAN