Skip to content

[Feature] Multi-teacher distillation support #1399

Description

@zahrayousefijamarani

Checklist

  • This feature will maintain backward compatibility with the current APIs in
    areal/api/. If not, please raise a refactor issue first.

Background

Current AReaL implementation supports knowledge distillation from a single teacher model in both:

On-policy Reverse KL distillation (RKL)
Combined GRPO + KD (KDRL)

However, many practical training setups benefit from using multiple teacher models simultaneously, where different teachers specialize in different capabilities (e.g., reasoning, instruction-following, domain-specific skills).

This motivates extending the current framework to support multi-teacher distillation via a weighted mixture distribution.

Potential Solution

A straightforward implementation is:

  1. Each teacher computes token-level log-probabilities for sampled trajectories.
  2. Teacher outputs are stacked across the teacher dimension.
  3. Mixture distribution is computed in log space using a numerically stable aggregation:
    3.1. Normalize or log-scale teacher weights
    3.2. Combine log-probabilities via log-sum-exp
  4. The resulting teacher_logp is treated as a single unified teacher signal.

This design has the advantage that:

  1. No changes are required to existing KD or KDRL objectives
  2. Backward compatibility with single-teacher setups is preserved
  3. Works uniformly for both rollout and train engines

Additional Information

This feature is particularly useful for:

  • Mixing large and small teacher models
  • Combining domain-specialized teachers (e.g., math + coding)
  • Ensembling checkpoints from different training stages
  • Improving robustness of distillation signal under teacher disagreement

It is also a natural extension of current on-policy distillation, as the student still samples trajectories from its own policy while receiving supervision from a richer, aggregated teacher distribution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions