'Research/.....' 카테고리의 글 목록

[SimPO] Simple Preference Optimization with a Reference-Free Reward

Research/..... 2024. 12. 14. 12:11

https://arxiv.org/pdf/2405.14734https://github.com/princeton-nlp/SimPOMay 2024 (NeurlPS 2024)AbstractDirect Preference Optimization (DPO) is a widely used offline preference optimization algorithm that reparameterizes reward functions in reinforcement learning from human feedback (RLHF) to enhance simplicity and training stability. In this work, we propose SimPO, a simpler yet more effective app..

[ORPO] Monolithic Preference Optimization without Reference Model

Research/..... 2024. 12. 14. 12:10

https://arxiv.org/pdf/2403.07691https://github.com/xfactlab/orpoAbstractWhile recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence. In this paper, we study the crucial role of SFT within the context of preference alignment, emphasizing that a minor penalty for the disfa..

[DPO] Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Research/..... 2024. 12. 14. 12:08

https://arxiv.org/pdf/2305.18290May 2023 (NeurIPS 2023)AbstractWhile large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining such steerability collect human labels of the relative quality of model generati..

[PPO] Proximal Policy Optimization Algorithms

Research/..... 2024. 12. 13. 14:42

https://arxiv.org/pdf/1707.06347(Aug 2017)AbstractWe propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a “surrogate” objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel ob..

ABOUT ME

밤에 쓰는 편지 밤에 쓰는 편지

티스토리툴바