-
Vanilla Policy Gradient ImplementationRL/RL 2024. 12. 15. 10:40
'RL > RL' 카테고리의 다른 글
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting (0) 2024.12.17 Proximal Policy Optimization Implementation (0) 2024.12.15 Reinforce Implementation (0) 2024.12.14 [SimPO] Simple Preference Optimization with a Reference-Free Reward (0) 2024.12.14 [ORPO] Monolithic Preference Optimization without Reference Model (0) 2024.12.14