Skip to content

Ternary/int8 QAT on a JAX-NNX transformer (Bonsai-style 1.58-bit)#47

Open
kmheckel wants to merge 1 commit into
mainfrom
feat/ternary-llm-qat
Open

Ternary/int8 QAT on a JAX-NNX transformer (Bonsai-style 1.58-bit)#47
kmheckel wants to merge 1 commit into
mainfrom
feat/ternary-llm-qat

Conversation

@kmheckel

@kmheckel kmheckel commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Shows spyx.quant generalizes beyond spiking nets: its BitNet-ternary and int8 QAT apply unchanged to a decoder-only transformer (rules match by op on dot_general, so any nnx.Linear qualifies) — the same 1.58-bit-weight approach as PrismML's Bonsai LLMs.

research/new/ternary_llm/ — a tiny NNX GPT + a fair 3-way QAT comparison:

variant val ppl
fp32 14.24
int8 weights 14.31
ternary 13.46

Ternary stays competitive with fp32. Quantization verified genuinely active (forward logits differ; quantized weights take few discrete codes — not a silent no-op). SMOKE=1 runs the full comparison on CPU in ~a minute; no new deps (reuses spyx.quant).

Honest caveat: qwix has no true 1.58-bit qtype, so 'ternary' is an int2 (4-code) approximation — disclosed in the study README. Pairs with the LiteRT export PR as the two edge-efficiency LOEs.

🤖 Generated with Claude Code

…8-bit)

research/new/ternary_llm/ demonstrates spyx.quant generalizes beyond spiking nets:
its BitNet-ternary and int8 QAT (bitnet_ternary_rules / weights_only_rules, matched
by op on dot_general) apply unchanged to a tiny decoder-only transformer built from
nnx.Linear — the same 1.58-bit-weight approach as PrismML's Bonsai LLMs.

3-way QAT comparison (same arch/seed/data): fp32 ppl 14.24, int8 14.31, ternary
13.46 — ternary stays competitive with fp32. Quantization verified genuinely active
(forward logits differ; ternary weights take few discrete codes, not a no-op).
SMOKE=1 runs the full comparison on CPU in ~a minute.

Note: qwix has no true 1.58-bit qtype, so 'ternary' is an int2 (4-code) approximation
— disclosed in the study.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant