Project: chini-bench-cli Version: 0.6.0 Last updated: April 24, 2026 Contact: squeak@chinilla.com
Email squeak@chinilla.com with the subject line chini-bench-cli security.
Please include:
- A description of the issue and its impact.
- Steps to reproduce, or a proof-of-concept.
- The version (
chini-bench --version) and the commit hash if you have it. - Any logs or canvas JSON involved (redact your API keys).
Please do not open a public GitHub issue first. I will reply within 7 days.
If the report is a real, fixable vulnerability I will:
- Acknowledge receipt within 7 days.
- Confirm the issue and ship a fix in the next patch release.
- Credit the reporter in the release notes (unless asked not to).
There is no bug bounty.
Only the latest minor version receives security fixes.
| Version | Supported |
|---|---|
| 0.6.x | Yes |
| 0.5.x | No (please upgrade) |
| < 0.5 | No |
The CLI is intentionally minimal. The full attack surface is:
- Read environment variables for API keys.
- Make HTTPS calls to one model provider you choose at runtime.
- POST a JSON canvas to the bench server (
https://chinilla.com/api/bench/*) when, and only when, you explicitly runsubmitorrun.
There is no daemon, no background process, no auto-update, no telemetry, no account system, no credential cache.
The CLI parses model output as JSON only (json.loads, never eval). A model
that returns malformed or oversized output should fail closed: a parse error,
a schema validation rejection, or a 4xx from the bench server. If you find an
output payload that escapes those checks, report it.
The dependency surface is small (requests, questionary, pydantic, plus
the optional provider SDKs you opt into). pip-audit runs against
requirements.txt on every release. If you find a transitive dep with a known
CVE that we missed, report it.
- Issues that require attacker control of your local machine (env vars,
filesystem, terminal). If they have your
OPENAI_API_KEYthey have already won; the CLI is not a defense layer. - Bench server vulnerabilities (
https://chinilla.com/api/bench/*). Report those to the same address but mark them as "bench server" in the subject; they are tracked in a separate repository. - Rate-limit bypass against model providers. You are responsible for your own API quota.
- Cosmetic issues, broken links, typos. Open a normal GitHub issue.
Automated checks run on every release. Reproduce locally:
pip install bandit pip-audit
bandit -r chini_bench
pip-audit -r requirements.txtLatest results (v0.6.0, 2026-04-24, 864 LOC):
| Tool | Result |
|---|---|
bandit -r chini_bench |
0 issues, all severities |
pip-audit -r requirements.txt |
No known vulnerabilities |
- No
subprocess,os.system,eval,exec,pickle, orshell=Trueanywhere in the package. - No SSL verification disabled. All
requestscalls use defaults plus an explicit 60s timeout. - API keys come from environment variables only. The CLI never writes them to disk, never logs them, and never sends them to anything except the matching provider's official SDK or REST endpoint.
- LLM output is parsed as JSON only (
json.loads), nevereval'd. - The bench server validates everything the CLI sends (submitter regex, model regex, canvas schema). The CLI is not a trusted client; the server treats it as untrusted input.
- No telemetry. The only outbound calls are to your chosen provider and to
https://chinilla.com/api/bench/*when you explicitly invokesubmitorrun. - Local result archive in
data/runs/stores only the canvas, the submitter handle, and the bench server response. No keys, no env, no PII.
You can verify that your installed CLI's harness prompts have not been modified upstream:
python -c "from chini_bench.prompt import system_prompt_hash, system_prompt_hash_reflex; \
print('single-shot:', system_prompt_hash()); \
print('reflex: ', system_prompt_hash_reflex())"Canonical hashes for v0.6.0:
chini-bench-cli:06d0ffb42f19chini-bench-reflex:42769353289d
A mismatch means either the package on your machine has been modified, or you
have intentionally edited SYSTEM_PROMPT (in which case your runs will surface
as custom on the leaderboard, not default, which is the intended behavior).