Use non-Anthropic models (GLM-5 via z.ai) as sub-agents in Claude Code, with the exact same capabilities as native Task tool sub-agents.
v2.2.1: Added --show-prompt flag to display full prompt before execution (like native Task tool UI).
v2.2.0: Now uses inactivity-based timeout (heartbeat pattern) β long tasks run indefinitely if active; stalled tasks are detected and killed quickly.
Claude Code's native Task tool spawns sub-agents, but:
- Sub-agents are Anthropic models only (expensive per-token pricing)
- No way to use cheaper alternatives like GLM-5
This wrapper solves both problems by spawning claude -p (headless mode) with z.ai environment variables, giving you:
- ALL native Claude Code tools (Read, Write, Edit, Glob, Grep, Bash, WebFetch, etc.)
- GLM-5 as the model ($30/mo unlimited via z.ai)
- Same behavior as native sub-agents
npm install -g @anthropic-ai/claude-codeSubscribe to GLM Coding Plan ($30/mo unlimited)
export ZAI_API_KEY="your-api-key-here"python subagent_template.py --task "Fix the bug in auth.py" --cwd /path/to/project --stream| File | Description |
|---|---|
subagent_template.py |
Use this - Template |
README.md |
Documentation |
LICENSE |
MIT License |
python subagent_template.py --task "Your task description" --cwd /path/to/projectpython subagent_template.py --task "Implement feature X" --cwd /path/to/project --stream--task Task description (required)
--cwd Working directory (default: current)
--inactivity-timeout Kill if no tool use for N seconds (default: 90)
--max-timeout Optional hard ceiling in seconds (default: unlimited)
--stream Show tool names as they execute
--show-prompt Display full prompt before execution (like native Task UI)
--max-budget Max cost in USD
--allowed-tools Comma-separated list of allowed tools
--debug Write debug logs to /tmp/glm-subagent-debug.log
Uses inactivity-based timeout instead of global timeout:
| Scenario | Behavior |
|---|---|
| Agent actively using tools | Timer resets on each tool use β runs indefinitely |
| Agent stalls (no tool use) | Killed after --inactivity-timeout seconds |
| Very long task | Use --max-timeout as safety ceiling |
This follows the heartbeat pattern recommended by Temporal and AWS for agentic workflows.
The wrapper returns clean JSON that your orchestrator can parse:
{"success": true, "result": "Task completed successfully...", "error": null}[subagent] a1b2c3d4e5 starting cwd=/path/to/project
[subagent] task: Implement feature X...
[subagent] a1b2c3d4e5 π§ Glob
[subagent] a1b2c3d4e5 π§ Read
[subagent] a1b2c3d4e5 π§ Edit
[subagent] a1b2c3d4e5 β
complete
[subagent] a1b2c3d4e5 β
success
[subagent] result: Implemented feature X by...
[subagent] logs: /tmp/glm-native-subagent/run_a1b2c3d4e5.stream.jsonl
{"success": true, "result": "Implemented feature X by...", "error": null}
[subagent] a1b2c3d4e5 starting cwd=/path/to/project
β Prompt: β Gray colored label
TASK: Update roleProvisioning.service.ts
You need to update the role provisioning configuration
so that Sales Managers automatically get proper permissions.
File Location:
/Users/Gabe/Dev/project/src/services/roleProvisioning.service.ts
[subagent] a1b2c3d4e5 π§ Read
[subagent] a1b2c3d4e5 π§ Edit
...
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Your Claude Code Session (Orchestrator) β
β β
β "Implement the EmailInbox component" β
β β
β [Bash] python subagent_template.py --task "..." --stream β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β subagent_template.py β
β β
β 1. Sets ANTHROPIC_BASE_URL to z.ai endpoint β
β 2. Spawns: claude -p --output-format stream-json β
β 3. Closes stdin immediately (prevents hang) β
β 4. Streams tool names to stdout (token-efficient) β
β 5. Writes full stream to log file (debuggability) β
β 6. Returns JSON result β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Claude Code CLI (GLM-5 via z.ai) β
β β
β - Fresh 200k token context window β
β - ALL native tools available β
β - Permissions inherited (--dangerously-skip-permissions) β
β - Session not persisted (isolation) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
When Claude Code calls a Bash command, output is captured and has limits:
- UI collapses long output
- Hard size limit (~30k chars) causes silent truncation
Old approach (broken):
stream-json events β stdout β Bash captures β TRUNCATED!
New approach (fixed):
stream-json events β log FILE (full fidelity)
tool names only β stdout (tiny, ~10 lines)
final JSON β stdout (always present)
In-place spinner updates (\r) don't work when output is captured by Bash tool. Each update becomes a new line β hundreds of lines.
Fix: Detect if stdout is a tty. Skip spinner if being captured:
if not sys.stdout.isatty():
return # Skip spinnerStreaming all events to stdout bloats the orchestrator's context.
Fix: Only print high-signal information:
- Task start/end
- Tool names (deduplicated)
- Result preview (truncated to 200 chars)
- Pointer to full logs
Full output is always saved to /tmp/glm-native-subagent/:
run_{id}.stream.jsonl # All stream-json events (when --stream)
run_{id}.stdout.txt # Raw stdout from subprocess
run_{id}.stderr.txt # Raw stderr from subprocess
Add this to your CLAUDE.md:
# Use GLM Sub-Agents (Not Native Task Tool)
For sub-agent work, use GLM wrapper via Bash:
\`\`\`bash
python /path/to/subagent_template.py --task "task" --cwd "$(pwd)" --stream
\`\`\`
Returns JSON: `{"success": bool, "result": "...", "error": null}`. Logs at `/tmp/glm-native-subagent/`.| Feature | Native Task Tool | This Wrapper |
|---|---|---|
| Model | Anthropic only | Any (GLM-5 via z.ai) |
| Cost | Per-token ($$$) | Flat rate ($30/mo) |
| Tools | All native | All native |
| Context isolation | β | β |
| Progress UI | Native panel | Terminal output |
| Truncation risk | None | None (fixed in v2.1) |
| Timeout | Global only | Inactivity-based (v2.2) |
| Feature | MCP Plugin | This Wrapper |
|---|---|---|
| Context cost | Schema in every prompt | Zero |
| Hot refresh | Requires restart | Instant |
| Build step | npm build | None |
| Complexity | TypeScript, plugin install | Single Python file |
Set ZAI_API_KEY or ANTHROPIC_AUTH_TOKEN environment variable.
The wrapper closes stdin immediately. If you're modifying the code, ensure:
proc.stdin.close() # CRITICALDefault is 120s. Increase with --timeout 300.
Install: npm install -g @anthropic-ai/claude-code
| Variable | Required | Description |
|---|---|---|
ZAI_API_KEY |
Yes* | z.ai API key |
ANTHROPIC_AUTH_TOKEN |
Yes* | Alternative to ZAI_API_KEY |
ZAI_BASE_URL |
No | Custom API endpoint |
ANTHROPIC_BASE_URL |
No | Alternative to ZAI_BASE_URL |
*One of ZAI_API_KEY or ANTHROPIC_AUTH_TOKEN is required.
| Requested | Actual |
|---|---|
| claude-opus | GLM-5 |
| claude-sonnet | GLM-5 |
| claude-haiku | GLM-5-Air |
MIT