RL/RL

f-DPG

밤 편지 2024. 12. 23. 14:11

* f-divergence


* f-divergence examples (KL-divergence, Total Variation Distance)


* Aligning LMs with Preferences through f-divergence Minimization


* Algorithm