Skip to content

Congratulations and introducing a similar work: XDLM #8

Description

@MzeroMiko

Thank you for sharing this exciting work! I particularly agree with your approach to combining quality and speedy modes.

In XDLM, we have also explored combining T2T and M2T via a stationary noise kernel and observed the same phenomena of unmasking, refinement, and remasking.

The left part of the image below shows how XDLM combines the noise kernels of UDLM (u) and MDLM (m) to achieve a favorable trade-off between the two methods. [NORMAL] denotes standard tokens, while [MASK] represents the mask token.
The right part illustrates the trade-off between understanding capability (zero-shot perplexity) and generation capability (generation perplexity at 32 sampling steps). The proposed XDLM with a mixing ratio of 0.1 achieves the optimal balance, labeled as the Sweet Spot.

With the unified stationary noise kernel, we derived the posterior probability and KL divergence along with its limiting case:

The figure below shows the part of the step-wise evolution of a generated sequence (T = [8/32]). XDLM shows three different transition dynamics inherent to the hybrid noise process: Green represents new tokens generated from masks; Blue represents lexical refinement; and Red highlights the re-masking operation where previously generated tokens are rejected and reverted to [MASK].

This phenomenon is also observed in image generation:

When scaled to tune an 8B-parameter large language model, XDLM achieves 15.0 MBPP in just 32 steps, effectively
doubling the baseline performance. Below shows the LLaDA-XDLM with sampling budget of 32. Evaluation of adapting LLaDA-8B to our XDLM for- mulation (LLaDA-XDLM): (a) LLaDA-XDLM consistently out-performs baselines across diverse benchmarks with 32 sampling steps; (b) Improvements are particularly pronounced in code generation (MBPP), where the model substantially reduces generation failures.

Please refer to https://github.com/MzeroMiko/XDLM and https://github.com/MzeroMiko/LLaDA-XDLM for details.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions