티스토리

밤에 쓰는 편지

검색하기

[6/6] Policy Gradient Methods

Research/RL_DeepMind

[6/6] Policy Gradient Methods

밤 편지 2024. 8. 10. 17:23

https://www.youtube.com/watch?v=e20EY4tFC_Q&list=PLzvYlJMoZ02Dxtwe-MmH4nOB5jYlMGBjr&index=6

Policy gradient methods take a more direct approach to the problem statement of RL and as a result, many of the most effective models are from this category. For example, Proxmal Policy Optimization is a type of policy gradient method, and that's OpenAI's go to RL algorithm. In fact, that's what they use to incorporate human feedback into ChatGPT's training. Considering how much that product has grown, it's pretty clear, these techniques have serious real world value.