-
[1 - generalization] Causal Reinforcement Learning*RL/RL 2025. 7. 22. 18:26
https://openreview.net/pdf?id=qqnttX9LPo

















↑ Typo... equation (2) Action-value 수식 Q(s,a) 에서 summation에 transition probability 누락됨!!
(이 오타를 Bellman님이 싫어합니다)















'*RL > RL' 카테고리의 다른 글
Lectures (0) 2025.08.21 [2 - Sample Efficiency] Causal Reinforcement Learning (0) 2025.07.22 f-DPG (0) 2024.12.23 Aligning Language Models with Preferences through f-divergence Minimization (0) 2024.12.23 [cdpg] Controlling Conditional Language Models without Catastrophic Forgetting (0) 2024.12.22