-
[DeepSeekMath] Pushing the Limits of Mathematical Reasoning in Open LMsLLMs/Reasoning 2026. 1. 2. 13:47
(Apr 2024)
섬세한 training data processing 와 RL algorithm 의 innovation (GRPO) 을 기반으로 놀라운 mathematical reasoning 성능 향상을 보여주었다.
RL phase의 effectiveness를 보여주고, 어떤 요소에 기인하는지를 analyze해서 향후 성공적인 RL training 방향을 제시해준다.
https://arxiv.org/pdf/2402.03300
https://github.com/deepseek-ai/DeepSeek-Math
GitHub - deepseek-ai/DeepSeek-Math: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models - deepseek-ai/DeepSeek-Math
github.com




※ Math Pre-Training / Supervised Fine-Tuning: data curation, training, evaluation 잘 설명되어있다. 읽었지만, 분량 상 생략..




























'LLMs > Reasoning' 카테고리의 다른 글
[COCONUT] Training LLMs to Reason in a Continuous Latent Space (0) 2026.01.04 s1: Simple test-time scaling (0) 2026.01.04 [Dr.GRPO] Understanding R1-Zero-Like Training: A Critical Perspective (0) 2026.01.02 [DeepSeek-R1] Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (0) 2026.01.02 (On-going) Mixture-of-Experts (0) 2025.12.31