[AMD] Add MiniMax-M3-FP4 MI355X ATOMESH update 0623#1930
Conversation
…els_atom.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…om.yaml-driven) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| # ============================================================================= | ||
| # Model-Specific Configuration from YAML | ||
| # ============================================================================= | ||
| # Load model-specific config from YAML (single parse for all fields) | ||
| eval "$(python3 -c " | ||
| import yaml | ||
| with open('${ATOM_WS_PATH}/models_atom.yaml') as f: | ||
| m = yaml.safe_load(f).get('${MODEL_NAME}', {}) | ||
| print(f'MODEL_ENVS=\"{m.get(\"env\", \"\")}\"') | ||
| print(f'MODEL_TP_DP_FLAGS=\"{m.get(\"tp_dp_flags\", \"\")}\"') | ||
| print(f'MODEL_EP_DP_FLAGS=\"{m.get(\"ep_dp_flags\", \"\")}\"') | ||
| print(f'MODEL_TP_DP_ENV=\"{m.get(\"tp_dp_env\", \"\")}\"') | ||
| print(f'MODEL_EP_DP_ENV=\"{m.get(\"ep_dp_env\", \"\")}\"') | ||
| print(f'MODEL_MTP_FLAGS=\"{m.get(\"mtp_flags\", \"\")}\"') | ||
| print(f'MODEL_KV_ARG=\"{m.get(\"kv_cache_flags\", \"\")}\"') |
There was a problem hiding this comment.
🔴 hf_overrides JSON quotes stripped by bash eval — breaks DeepSeek-V4-Pro startup. The Python f-string print(f'_HF_OVERRIDES="{m.get("hf_overrides", "")}"') interpolates the raw YAML value ({"use_index_cache":true,"index_topk_freq":4}) into an outer double-quoted bash assignment, so the inner " characters terminate the outer string. After eval, _HF_OVERRIDES holds {use_index_cache:true,index_topk_freq:4} — unquoted keys — and --hf-overrides '{use_index_cache:true,index_topk_freq:4}' is rejected as invalid JSON, blocking the documented Verify server_atom.sh launches correctly for DeepSeek-V4-Pro test plan item. Fix: emit the value with shlex.quote() (or json.dumps) in the Python so the JSON survives the bash eval intact.
Extended reasoning...
What the bug is
The new YAML-driven loader at server_atom.sh:80-94 runs a Python snippet whose output is captured and eval'd as bash. The hf_overrides field for DeepSeek-V4-Pro in models_atom.yaml is the JSON string {"use_index_cache":true,"index_topk_freq":4}. The emitter line is:
print(f'_HF_OVERRIDES="{m.get("hf_overrides", "")}"')For DeepSeek-V4-Pro this prints literally:
_HF_OVERRIDES="{"use_index_cache":true,"index_topk_freq":4}"
The inner unescaped " characters terminate the outer double-quoted region. Bash treats the line as a concatenation of quoted and unquoted word segments — "{" + use_index_cache + ":true," + index_topk_freq + ":4}" — and assembles them into a single word with all the quotes stripped.
Step-by-step proof
Reproduced locally with the exact Python emitter and bash eval:
$ python3 -c 'm={"hf_overrides": "{\"use_index_cache\":true,\"index_topk_freq\":4}"}; print(f"_HF_OVERRIDES=\"{m.get(\"hf_overrides\", \"\")}\"")'
_HF_OVERRIDES="{"use_index_cache":true,"index_topk_freq":4}"
$ eval '_HF_OVERRIDES="{"use_index_cache":true,"index_topk_freq":4}"' && echo "[$_HF_OVERRIDES]"
[{use_index_cache:true,index_topk_freq:4}]
$ python3 -c 'import json; json.loads("{use_index_cache:true,index_topk_freq:4}")'
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
After the eval, the downstream line
HF_OVERRIDES_ARG="--hf-overrides '${_HF_OVERRIDES}'"produces --hf-overrides '{use_index_cache:true,index_topk_freq:4}' — an invalid JSON literal with unquoted keys. The atom server's argparse / json.loads on --hf-overrides will reject this at startup.
Why existing code doesn't prevent it
The pre-PR code hard-coded the value as a bash string with backslash-escaped inner quotes:
HF_OVERRIDES_ARG="--hf-overrides '{\"use_index_cache\":true,\"index_topk_freq\":4}'"
That escaping is exactly what survives bash parsing, and it is what the new YAML-driven path loses. DeepSeek-V4-Pro is the only model in models_atom.yaml with a non-empty hf_overrides (the other models' YAML fields contain no " characters, so they are unaffected); the other emitted assignments (env, tp_dp_flags, etc.) are safe.
Impact
This regresses the dsv4-fp4-mi355x-atom-disagg recipe in amd-master.yaml (which sets MODEL_NAME=DeepSeek-V4-Pro and routes through server_atom.sh). The server will fail at startup when atom's argparse calls json.loads on the --hf-overrides argument — and this is precisely the path the PR's own test plan flags (Verify server_atom.sh launches correctly for DeepSeek-V4-Pro).
Fix
Quote the value in the Python emitter so the bash eval sees a properly-escaped literal. Either:
import shlex
print(f'_HF_OVERRIDES={shlex.quote(m.get("hf_overrides", ""))}')(produces _HF_OVERRIDES='{"use_index_cache":true,"index_topk_freq":4}', which bash parses correctly), or write each value to a NUL-delimited side channel that bash reads with read -d '' instead of evaling arbitrary Python output.
| MTP : method=mtp num_speculative_tokens=${DECODE_MTP_SIZE} | ||
| xP/yD : ${xP} / ${yD} | ||
| KV cache : dtype=${KV_CACHE_DTYPE:-auto} block_size=${BLOCK_SIZE} mem_frac=${MEM_FRAC_STATIC} | ||
| KV cache : ${KV_CACHE_ARG:-none} block_size=${BLOCK_SIZE} mem_frac=${MEM_FRAC_STATIC} |
There was a problem hiding this comment.
🟡 Nit: line 193's INFO banner prints the literal string MTP : method=mtp num_speculative_tokens=${DECODE_MTP_SIZE}, but mtp_flags is now YAML-driven and MiniMax-M3-MXFP4/MXFP8 use --method eagle3 --draft-model Inferact/MiniMax-M3-EAGLE3. When those models run with SPEC_DECODING=mtp, the banner will misleadingly claim method=mtp. Pure log/cosmetic — the Spec args : ${SPEC_ARGS[*]} line immediately below prints the actual flags. Suggest dropping the hardcoded method=mtp (the Spec args line already covers it) or replacing with ${MODEL_MTP_FLAGS}.
Summary
MODEL_NAME == "DeepSeek-V4-Pro"/ per-model checks fromserver_atom.shmodels_atom.yamlusing the samepython3 yaml.safe_loadpattern asserver_vllm.shMiniMax-M3-MXFP4andMiniMax-M3-MXFP8entries tomodels_atom.yamlwith EAGLE3 MTP flagsminimaxm3-fp8-mi355x-atom-disagg:rocm/atom-dev:MiniMax-M3-20260622→rocm/atom-dev:MiniMax-M3-20260623Fields added to
models_atom.yamlenvKEY=VALUEpairs exported unconditionallytp_dp_flagstp_dp_envep_dp_flagsep_dp_envmtp_flagsSPEC_ARGSbefore$DECODE_MTP_SIZEkv_cache_flags--kv_cache_dtypeflag stringhf_overrides--hf-overridesPR Review Checklist
🤖 Generated with Claude Code