[Feature] Multi-teacher distillation support

## Checklist

- [x] This feature will maintain backward compatibility with the current APIs in
  `areal/api/`. If not, please raise a refactor issue first.

## Background

Current AReaL implementation supports knowledge distillation from a single teacher model in both:

On-policy Reverse KL distillation (RKL)
Combined GRPO + KD (KDRL)

However, many practical training setups benefit from using multiple teacher models simultaneously, where different teachers specialize in different capabilities (e.g., reasoning, instruction-following, domain-specific skills).

This motivates extending the current framework to support multi-teacher distillation via a weighted mixture distribution.

## Potential Solution

A straightforward implementation is:

1. Each teacher computes token-level log-probabilities for sampled trajectories.
2. Teacher outputs are stacked across the teacher dimension.
3. Mixture distribution is computed in log space using a numerically stable aggregation:
  3.1. Normalize or log-scale teacher weights
  3.2. Combine log-probabilities via log-sum-exp
4. The resulting teacher_logp is treated as a single unified teacher signal.

This design has the advantage that:

1. No changes are required to existing KD or KDRL objectives
2. Backward compatibility with single-teacher setups is preserved
3. Works uniformly for both rollout and train engines

## Additional Information

This feature is particularly useful for:

- Mixing large and small teacher models
- Combining domain-specialized teachers (e.g., math + coding)
- Ensembling checkpoints from different training stages
- Improving robustness of distillation signal under teacher disagreement

It is also a natural extension of current on-policy distillation, as the student still samples trajectories from its own policy while receiving supervision from a richer, aggregated teacher distribution.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Multi-teacher distillation support #1399

Checklist

Background

Potential Solution

Additional Information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Feature] Multi-teacher distillation support #1399

Description

Checklist

Background

Potential Solution

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions