'*RL/RL' 카테고리의 글 목록

Miscellaneous (already overloaded list, with further updates to come)

*RL/RL 2025. 8. 23. 12:33

* Google Researchhttps://research.google/blog/?search=reinforcement& Latest News from Google Research Blog - Google Research research.google * Open AI Spinning up https://spinningup.openai.com/en/latest/ Welcome to Spinning Up in Deep RL! — Spinning Up documentation© Copyright 2018, OpenAI. Revision 038665d6.spinningup.openai.com * Policy Gradient Algorithmshttps://lilianweng.github.io/posts/20..

Lectures

*RL/RL 2025. 8. 21. 22:31

https://www.youtube.com/playlist?list=PLzvYlJMoZ02Dxtwe-MmH4nOB5jYlMGBjr Reinforcement Learning By the Book www.youtube.com https://www.youtube.com/playlist?list=PLwRJQ4m4UJjNymuBM9RdmB3Z9N5-0IlY0 Foundations of Deep RL -- 6-lecture series by Pieter Abbeel www.youtube.com * DeepMind 와 Pieter Abbeel 교수님의 lecture 강추. RL을 공부한다면 필수 시청 * Mutual Information youtube series는 simulation을 시각적으로 정말 잘 보여줌...

[2 - Sample Efficiency] Causal Reinforcement Learning

*RL/RL 2025. 7. 22. 23:21

[1 - generalization] Causal Reinforcement Learning

*RL/RL 2025. 7. 22. 18:26

https://openreview.net/pdf?id=qqnttX9LPo↑ Typo... equation (2) Action-value 수식 Q(s,a) 에서 summation에 transition probability 누락됨!!(이 오타를 Bellman님이 싫어합니다)

f-DPG

*RL/RL 2024. 12. 23. 14:11

* f-divergence* f-divergence examples (KL-divergence, Total Variation Distance)* Aligning LMs with Preferences through f-divergence Minimization* Algorithm

Aligning Language Models with Preferences through f-divergence Minimization

*RL/RL 2024. 12. 23. 07:13

https://arxiv.org/pdf/2302.08215(Jun 2023 ICML 2023) 짜잔~ 종합선물세트 입니다~!f-divergence로 대동단결.옛다. 크리스마스 선물 받아랏~! 너무 당연한 이야기이지만, 목표를 어떻게 설정하느냐에 따라 인생은 완전히 달라진다.목적지 변경은 우리를 완전히 다른 방향으로 이끌기도 한다.objective function (loss function)에 따라서 model의 behavior가 달라지는 건 참 흥미진진하다.(좀 더 구체적으로는..- 어떠한 measure로 target distribution을 approximate할 것인가- metric에 따라 convergence하는 양상은 어떻게 달라지는가) 이 논문을 읽는데 왜케 행복하지? ㅜㅜ 이 논문을 읽기 위해..

[cdpg] Controlling Conditional Language Models without Catastrophic Forgetting

*RL/RL 2024. 12. 22. 08:49

https://arxiv.org/pdf/2112.00791(Jun 2022 ICML 2022)AbstractMachine learning is shifting towards general-purpose pretrained generative models, trained in a self-supervised manner on large amounts of data, which can then be applied to solve a large number of tasks. However, due to their generic training methodology, these models often fail to meet some of the downstream requirements (e.g., halluc..

(3/3) GAN, F-Divergence, IPM

*RL/RL 2024. 12. 22. 00:34

1. Example of Parallel Line Density2. Wasserstein Distance with GAN3. Kantarovich-Rubinstein Duality4. Wasserstein as Primal LP5. Wasserstein as Dual LP6. Property of Dual LP on Wasserstein Distance7. Lipschitz Continuity8. Dual Problem of Wasserstein Distance9. Kantorovich-Rubinstein Duality & Wasserstein GAN

ABOUT ME

밤에 쓰는 편지 밤에 쓰는 편지

티스토리툴바