long_tail detection triggers too early, causing token repetition before forced EOS

## Problem

The `alignment_stream_analyzer` in the Multilingual model forces EOS via `long_tail=tensor(True)` at very low sample counts (~60-180 out of 1000 max), causing the model to repeat tokens right before being cut off. This results in duplicated phrases in the generated audio.

## Reproduction

**Setup:**
- Model: `ChatterboxMultilingualTTS.from_pretrained(device="cuda")`
- Mode: Zero-shot voice cloning with German reference WAV
- Hardware: CUDA (DGX Spark / NVIDIA GB10)
- Python 3.12, Linux arm64

**Trigger:** Split long text into sentences and generate each individually. Multiple sentences get terminated early by the analyzer.

## Observations (from logs)

```
Sentence 1/8: terminated at sample 60/1000
  forcing EOS token, long_tail=tensor(True), alignment_repetition=tensor(False), token_repetition=False

Sentence 2/8: terminated at sample 156/1000
  forcing EOS token, long_tail=tensor(True), alignment_repetition=tensor(False), token_repetition=False

Sentence 3/8: terminated at sample 104/1000
  forcing EOS token, long_tail=tensor(True), alignment_repetition=tensor(False), token_repetition=False

Sentence 5/8: terminated at sample 179/1000
  forcing EOS token, long_tail=tensor(True), alignment_repetition=tensor(False), token_repetition=False

Sentence 7/8: terminated at sample 143/1000
  Detected 3x repetition of token 4218
  forcing EOS token, long_tail=tensor(False), alignment_repetition=tensor(False), token_repetition=True
```

## Impact

When Whisper transcribes the resulting audio, phrases appear duplicated at chunk boundaries where the analyzer forced EOS. This makes the output unusable for longer texts without manual post-processing.

## Expected Behavior

The `long_tail` threshold should allow natural sentence completion before triggering. For German sentences of 50-120 characters, the model typically needs 200-400+ samples to complete. Early termination at 60-180 samples cuts off mid-generation and causes token repetition artifacts.

## Suggested Fix

Increase the `long_tail` sensitivity thresholds in `alignment_stream_analyzer` or make them configurable via `model.generate()` parameters. A per-language calibration might also help, as German phonetics may produce different alignment patterns than English training data.

## Environment

- `chatterbox` pip package (latest as of May 2026)
- CUDA 13.0, sm_121 (NVIDIA GB10 / DGX Spark)
- Python 3.12, Linux (arm64)
- Reference audio: ~10s German male voice clip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

long_tail detection triggers too early, causing token repetition before forced EOS #519

Problem

Reproduction

Observations (from logs)

Impact

Expected Behavior

Suggested Fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

long_tail detection triggers too early, causing token repetition before forced EOS #519

Description

Problem

Reproduction

Observations (from logs)

Impact

Expected Behavior

Suggested Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions