[d1] Scaling Reasoning in dLLMs via RL

LLMs/Diffusion 2026. 5. 7. 18:31

https://dllm-reasoning.github.io/

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Table below shows the detailed performance comparison across different benchmarks and generation sequence lengths. d1-LLaDA consistently outperforms all other models, with diffu-GRPO showing better performance than SFT alone. Table: Model performance on GS

dllm-reasoning.github.io

(NeurIPS 2025 spotlight)

'LLMs > Diffusion' 카테고리의 다른 글

[KL-Adaptive DPG] Distributional Approach to Controlled Text Generation (0)	2026.05.09
[D-CFG] Simple Guidance Mechanism for Discrete Diffusion Models (0)	2026.05.08
[LLaDA] Large Language Diffusion Models (0)	2026.05.07
[MDLMs] Simple and Effective Masked Diffusion Language Models (0)	2026.05.05
Likelihood-Based Diffusion Language Models (0)	2026.01.01

ABOUT ME

밤에 쓰는 편지 밤에 쓰는 편지

'LLMs > Diffusion' 카테고리의 다른 글

티스토리툴바

ABOUT ME

'LLMs > Diffusion' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바