Skip to content

refactor: remove unused other_mimi MimiModel instance (~200MB memory savings)#72

Open
mvanhorn wants to merge 1 commit intoNVIDIA:mainfrom
mvanhorn:refactor/46-remove-unused-other-mimi
Open

refactor: remove unused other_mimi MimiModel instance (~200MB memory savings)#72
mvanhorn wants to merge 1 commit intoNVIDIA:mainfrom
mvanhorn:refactor/46-remove-unused-other-mimi

Conversation

@mvanhorn
Copy link
Copy Markdown

@mvanhorn mvanhorn commented Apr 6, 2026

Summary

Removes the second MimiModel instance (other_mimi) from server.py and offline.py. Every call to other_mimi.encode() and other_mimi.decode() discards the result by assigning to _. The primary mimi instance produces all actual output. Removing other_mimi frees ~200MB GPU memory.

Why this matters

#46 identified that other_mimi loads a full copy of the Mimi codec weights but never uses any output. On memory-constrained deployments (consumer GPUs, Jetson), 200MB is significant.

Changes

moshi/moshi/server.py:

  • Removed other_mimi field from ServerState dataclass
  • Removed other_mimi parameter from __init__, warmup, and handle_chat
  • Removed other_mimi = loaders.get_mimi(...) instantiation in main()
  • Removed all _ = self.other_mimi.encode(chunk) and _ = self.other_mimi.decode(tokens[:, 1:9]) calls

moshi/moshi/offline.py:

  • Removed other_mimi parameter from warmup() and decode_tokens_to_pcm()
  • Removed other_mimi = loaders.get_mimi(...) instantiation in run_inference()
  • Removed all _ = other_mimi.encode(chunk) and _ = other_mimi.decode(tokens[:, 1:9]) calls

Net: -22 lines, +6 lines (signature adjustments).

Testing

Verified via grep -rn other_mimi --include="*.py" that zero references remain. The primary mimi encode/decode path is untouched.

No test suite exists in this repo.

Note for maintainer

If other_mimi was intended for a future streaming state feature or a separate voice prompt encoding path, please let me know and I'll revert. Based on the current code, all its outputs are discarded.

Fixes #46

This contribution was developed with AI assistance (Claude Code).

The second MimiModel instance (other_mimi) in server.py and offline.py
processes the same audio as the primary mimi, but every encode/decode
result is discarded (assigned to _). Removing it saves ~200MB GPU memory.

Addresses NVIDIA#46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clarification needed: Purpose of duplicate MimiModel instance (other_mimi) in server.py

1 participant