Infinite Mask Diffusion for Few-Step Distillation

Abstract

Masked Diffusion Models (MDMs) have emerged as a promising alternative to autoregressive models in language modeling, offering the advantages of parallel decoding and bidirectional context processing within a simple yet effective framework. Specifically, their explicit distinction between masked tokens and data allows their simple framework and effective conditional generation. However, MDMs typically require many sampling iterations due to factorization errors stemming from simultaneous token updates. We observe that a theoretical lower bound of the factorization error exists, which standard MDMs cannot reduce due to their use of a deterministic single-state mask. In this paper, we propose the Infinite Mask Diffusion Model (IMDM), which introduces a stochastic infinite-state mask to mitigate the theoretical bound while directly inheriting the benefits of MDMs, including the compatibility with pre-trained weights. We empirically demonstrate that MDM fails to perform few-step generation even in a simple synthetic task due to the factorization error bound, whereas IMDM can find an efficient solution for the same task. Finally, when equipped with appropriate distillation methods, IMDM surpasses existing few-step distillation methods at small step counts on LM1B and OpenWebText.

Key Innovations

A theoretical lower bound on factorization error in MDMs.
We identify a fundamental lower bound on the factorization error that standard Masked Diffusion Models cannot escape, originating from their use of a single deterministic mask state. This bound explains why MDMs degrade sharply in the few-step regime.
Infinite Mask Diffusion Model (IMDM) with a stochastic infinite-state mask.
IMDM replaces the deterministic single-state mask with a stochastic, infinite-state mask, mitigating the theoretical error bound while preserving the simplicity and conditional-generation benefits of MDMs — including direct compatibility with pre-trained MDM weights.

@inproceedings{yoo2026imdm, title={Infinite Mask Diffusion for Few-Step Distillation}, author={Yoo, Jaehoon and Kim, Wonjung and Lee, Chanhyuk and Hong, Seunghoon}, year={2026}, booktitle={ICML} }

Infinite Mask Diffusion for Few-Step Distillation

TL;DR: We propose Infinite Mask Diffusion Model (IMDM), which leverages the simple design and effective conditional generation of Masked Diffusion Models while overcoming their theoretical lower bound of factorization error.

Abstract

Key Innovations

Results on LM1B

Results on OpenWebText

860M Model Results on OpenWebText

Citation