chore(clawker-support): add skill eval test cases for the support skill

## Summary

The `clawker-support` skill (`claude-plugin/clawker-support/skills/clawker-support/`) has no eval test cases. Changes to SKILL.md or the `reference/` files are currently verified only by manual review — there is no way to measure whether an edit improves or regresses the skill's triggering accuracy or answer quality.

Add a skill eval suite so the skill can be benchmarked with the skill-creator eval tooling (`/skill-creator` supports running evals, benchmarking performance with variance analysis, and optimizing the description for triggering accuracy).

## Why

- The skill is delivered to external users via the plugin marketplace; every patch bump ships unreviewed-by-eval behavior changes.
- The skill's core design principle is "minimal concrete details, look up sources just in time" — easy to silently break (e.g. an edit that makes the agent answer from memory instead of fetching docs).
- Triggering is intentionally broad (fires on `.clawker.yaml`, blocked domains, build errors, post_init, etc. without the word "clawker") — description edits risk both under- and over-triggering, which evals can catch.

## Scope

Create an `evals/` directory for the skill with test cases covering at least:

**Triggering accuracy**
- Positive: questions that should trigger without saying "clawker" (e.g. "my domain is blocked", "why does my .clawker.yaml not add this package", "post_init didn't run")
- Negative: generic Docker/firewall questions that should NOT trigger the skill

**Answer quality / methodology adherence**
- Config-level awareness: project-specific config must land in project-level `.clawker.yaml`, never user-level (`~/.config/clawker/clawker.yaml`)
- Source discipline: recommendations grounded in fetched docs/reference files, not training data
- Firewall path-scoping: narrowest-scope rule recommended (`--path`/`--action`, `path_rules`) rather than bare domain allows
- Troubleshooting routing: symptom → correct domain reference file (build vs firewall vs MCP vs monitoring vs known-issues)
- Known-issues lookup: a reported symptom that matches `reference/known-issues.md` should surface the documented workaround

**Regression cases**
- Past failure modes worth pinning (e.g. recommending user-level config pollution, inventing config field names, hardcoding firewall domains)

## Acceptance criteria

- [ ] Eval cases live alongside the skill and follow the skill-creator eval format
- [ ] At least one positive + one negative triggering case, plus methodology cases above
- [ ] Baseline run documented (so future SKILL.md edits can be compared)
- [ ] `claude-plugin/clawker-support/CLAUDE.md` completion gate updated to mention running evals after skill changes
- [ ] Patch version bump per plugin versioning policy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(clawker-support): add skill eval test cases for the support skill #358

Summary

Why

Scope

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

chore(clawker-support): add skill eval test cases for the support skill #358

Description

Summary

Why

Scope

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions