Lint your AI harness so weak small models can actually run it. Finds the phrasing that breaks Minimax 2.5, Nemotron 3, Mistral 7B, and local models — then rewrites it with a smart model.
You author a harness (opencode modes, Claude Code agent files, Cursor rules,
plain .md skills) with a frontier model. You want to run it on a cheap
small model. The logic is fine — but the prose is full of phrases
that only a frontier model knows how to interpret:
should, when relevant, one of the usual categories, the table above,
creative, leveraged. Small models drop clauses, invent items, ignore
soft imperatives, and blow past taste words.
Isolint fixes that:
┌─────────────────┐ ┌─────────────────────┐ ┌─────────────────┐
│ Your harness │ isolint │ Lint report + diff │ --fix │ Same harness, │
│ (.md files) │ ────────▶ │ of every phrase │ ────────▶ │ small-model- │
│ │ │ a 7B model misses │ │ safe prose │
└─────────────────┘ └─────────────────────┘ └─────────────────┘
Isolint also ships an Isomorphic Plan engine (isolint plan / run) —
a large model emits a strict JSON plan; a small model executes it with
schema validation. The linter uses this engine internally for its
LLM-assisted rules. Use it directly when you want a fully-deterministic
pipeline instead of markdown prose.
Most "agent" systems push all complexity into runtime prose: long system prompts, sprawling instructions, taste-based validation. 7B-class models collapse under that weight — not because the logic is wrong, but because the prose is ambiguous.
The default recommended preset scans for **28 deterministic rule patterns
- 5 LLM-assisted rules**, each targeting one concrete small-model failure
mode. An optional
performancepreset adds 18 advisory rules for harness overhead: repeated instructions, oversized examples, redundant contracts, low-value prose, saturated emphasis, mode-conditional branches in shared prefixes, cross-file duplication, and other avoidable token/latency costs. Every finding is a fixable phrase. Every fix preserves intent and markdown formatting — and every LLM rewrite is re-linted before being applied so bad fixes never ship.
src/
├── lint/ # Markdown harness linter (deterministic + LLM-assisted)
│ ├── rules/ # Individual rules (soft-imperative, taste-word, ...)
│ ├── fix.ts # Deterministic fix engine + LLM rewriter
│ └── ...
├── schema/ # Strict Plan schema (the contract)
├── planner/ # Large-model → Plan (JSON)
├── runtime/ # Small-model executor (validates every step)
├── providers/ # OpenAI / OpenRouter / Ollama / Mock
├── util/ # JSON extraction, logger
└── cli/ # isolint lint | verify | plan | run | validate
If you use Claude Code / Codex / Cursor to author a harness and then run it on minimax / nemotron / local models, the #1 failure mode is not logic — it's prose the small model can't parse. The linter catches that.
Every rule targets a concrete failure mode in weak models. All rules run against fenced code blocks, inline code, and HTML comments are skipped.
| Rule | Catches | Severity |
|---|---|---|
soft-imperative |
should, could, might, consider, ideally — weak models drop these |
warn |
vague-quantifier |
some, several, a few, many, most without a number |
warn |
taste-word |
creative, engaging, passionate, leveraged, cutting-edge, etc. |
warn |
ambiguous-deictic |
the table above, as mentioned below — position-based refs |
info |
double-negation |
don't forget to not skip… — weak models flip the sign |
warn |
long-sentence |
Sentences > 35 words — weak models drop clauses | info |
pronoun-no-antecedent |
It, This, They at start of block after a list/heading |
info |
enum-without-list |
one of the usual categories without the list inline |
warn |
word-count-target |
write 100 words — weak models count tokens poorly |
info |
implicit-conditional |
when relevant, if appropriate, as needed |
warn |
trailing-etc |
A, B, C, etc. — unclosed set, weak models invent items |
warn |
heading-without-imperative |
Mode headings without an action verb | info |
nested-conditional |
Multiple if / unless / except in one sentence |
warn |
multiple-output-formats |
"return JSON and a summary" in one step | warn |
placeholder-leftover |
TODO, FIXME, <insert X>, [INSERT X] — weak models echo scaffolding |
warn |
output-format-no-example |
"return JSON" with no example, schema, or field list nearby | warn |
numbered-step-gap |
1. … 2. … 4. … — weak models inherit the gap and skip the step |
warn |
step-without-verb |
## Step N body that opens with a noun, not an imperative verb |
info |
undefined-step-reference |
Prose mentions Step 5 when only 3 steps exist across the harness |
warn |
missing-file-reference |
Read cv.md when cv.md doesn't exist in the repo |
warn |
context-budget |
Harness file over the prose-length budget — weak models drop the middle | info / warn |
dangling-variable-reference |
$input.X / $steps.Y.output with no declared source — weak models hallucinate the value |
warn |
invalid-json-fence |
```json fence body that doesn't parse as JSON — weak models copy the malformed shape |
warn |
heading-hierarchy |
Skipped heading levels (# A → ### B) — weak models use depth as structure |
info |
stale-link-reference |
[label](./missing.md) where the target isn't in the repo |
warn |
table-column-mismatch |
Table rows with different cell counts than the header | warn |
mixed-list-marker |
One list mixes - and * markers — weak models split on the change |
info |
frontmatter-schema |
Harness frontmatter missing required fields (Claude Code description, Cursor globs/alwaysApply) |
warn |
Plus five LLM-assisted rules (opt-in via --llm) that use a smart
model with JSON-mode output for checks that need judgment:
| Rule | Catches |
|---|---|
llm-atomicity |
Instructions bundling multiple actions into one sentence |
llm-implicit-context |
Phrases that rely on context not present in the file |
llm-unexplained-schema |
Output contracts in prose with no example / schema nearby |
llm-tone-drift |
Authoritative imperatives followed by casual phrasing in the same section |
llm-implicit-assumption |
Sentences that reference entities with no prior definition |
# Scan any directory of .md / .mdc / .mdx files — reliability + performance by default
npx @agent-pattern-labs/isolint lint /path/to/harness
# Reliability rules only (opt out of performance)
npx @agent-pattern-labs/isolint lint /path/to/harness --preset recommended
# Performance rules only (advisory, info-severity)
npx @agent-pattern-labs/isolint lint /path/to/harness --preset performance
# See how many tokens your harness costs per turn (shared prefix + per-mode + per-agent)
npx @agent-pattern-labs/isolint cost /path/to/harness
# Just your opencode modes / Claude Code agents / Cursor rules
npx @agent-pattern-labs/isolint lint .opencode/skills .opencode/agents modes .cursor/rules
# Only lint files changed since main (great for PR CI)
npx @agent-pattern-labs/isolint lint . --since origin/main --fail-on warn
# JSON for CI / GitHub annotations (SARIF)
npx @agent-pattern-labs/isolint lint ./modes --format sarif > lint.sarif
# Auto-fix everything the rules can fix deterministically + via LLM rewrites
export OPENROUTER_API_KEY=sk-or-...
npx @agent-pattern-labs/isolint lint ./modes --fix --llm --large anthropic/claude-3.5-sonnet
# Dry-run: see the diff, don't write files
npx @agent-pattern-labs/isolint lint ./modes --fix --llm --diff.md, .mdc (Cursor rules), and .mdx are all picked up by default. YAML
(---) / TOML (+++) frontmatter at the top of any file is skipped.
When run inside a git repo, isolint honors .gitignore (plus
.git/info/exclude and any global git excludes) so build output like
generated CLAUDE.md / AGENTS.md / .cursor/rules/ copies don't get
linted. Pass --no-gitignore to disable.
Any file can silence findings with HTML comments:
<!-- isolint-disable-next-line taste-word -->
These words (leveraged, cutting-edge, passionate) are ATS red flags.
<!-- isolint-disable -->
This whole block is documentation about banned words, not an instruction.
- leveraged
- utilized
- spearheaded
<!-- isolint-enable -->No rule ids = suppress all rules on the target line(s).
Both .isolint.json and the legacy .isomodel-lint.json are read. Shape:
{
"extends": ["recommended"],
"rules": {
"pronoun-no-antecedent": "off",
"long-sentence": "warn"
},
"ignore": ["docs/archive/**", "*.draft.md"],
"options": {
"long-sentence.max_words": 40,
"taste-word.extra": ["bespoke", "revolutionary"]
},
"skip_spans": {
"quoted_strings": true,
"quoted_strings_max_chars": 40
},
"custom_rules": [
{
"id": "no-acme-brand",
"pattern": "\\bAcme\\s+Corp\\b",
"severity": "warn",
"message": "Use 'ACME Inc' instead of 'Acme Corp'"
}
]
}custom_rules[] lets a team codify its own banned phrases without patching
isolint. Each spec takes id (must not collide with a built-in rule),
pattern (regex source), optional flags (default gi), optional
severity (default warn), and message. Invalid specs are logged to
stderr and skipped — one bad pattern never breaks a whole run.
Presets:
recommended— deterministic reliability rules only; safe for CI.strict—recommended+ all five LLM-assisted rules; requires--llm.performance— 18 advisory deterministic rules for harness efficiency.
Default (no config, no --preset): recommended + performance runs
together. Performance findings are info severity so CI exit codes
under the default --fail-on error are unaffected. Set
{"extends": ["recommended"]} in .isolint.json or pass
--preset recommended to keep only reliability rules.
Combine performance with either reliability preset, either via config:
{
"extends": ["recommended", "performance"]
}Or directly on the command line with --preset (repeatable, or comma-separated):
npx @agent-pattern-labs/isolint lint . --preset recommended --preset performance
# equivalent to:
npx @agent-pattern-labs/isolint lint . --preset recommended,performance--preset overrides the config's extends when set — useful for one-off
runs without editing .isolint.json. Valid values: recommended,
strict, performance.
With --fix --llm, the performance preset can rewrite duplicated output
contracts and redundant schema prose. perf-style-tone-overhead also has
a deterministic fix for simple trailing tone/style suffixes.
The performance preset adds:
perf-repeated-instruction-blockperf-example-heavy-sectionperf-duplicated-output-requirementperf-step-restates-prior-stepperf-low-value-prose-sectionperf-shared-prefix-budgetperf-large-example-in-shared-prefixperf-long-runbook-in-shared-prefixperf-redundant-schema-proseperf-structured-output-explanationperf-style-tone-overheadperf-mirrored-agent-specperf-rationale-in-shared-prefix—Why:/Historical note:/ dated incident narratives in always-loaded filesperf-emphasis-inflation— saturated MUST / NEVER / CRITICAL density (weak models ignore saturated emphasis)perf-cross-file-duplicate-block— same paragraph copy-pasted verbatim across ≥2 harness filesperf-dense-prohibition-list— 3+ consecutiveDo not X. Never Y. Must not Z.sentences that should be a bullet listperf-conditional-mode-branch-in-shared-prefix—When the orchestrator dispatches an \apply`…` branches that belong in the mode's own fileperf-nested-conditional-chain— sentences chaining 3+if/when/unlessconditions that weak models can't track
Lint tells you what's wasteful; cost tells you how much you're paying.
# Quick view: always-loaded baseline + per-mode + per-agent breakdown
npx @agent-pattern-labs/isolint cost /path/to/harness
# Fail CI if the always-loaded cost exceeds a budget
npx @agent-pattern-labs/isolint cost /path/to/harness --budget 10000
# Machine-readable, for scripts / dashboards
npx @agent-pattern-labs/isolint cost /path/to/harness --format jsonThe command reports cost per tool load group because different tools load different subsets of the shared-prefix files — you pay one tool's bundle per turn, not all of them summed together:
- Claude Code loads
CLAUDE.md. - AGENTS.md convention (opencode / Codex CLI / Zed) loads
AGENTS.md+modes/_shared.md+.opencode/instructions.mdwhen those exist. - Cursor loads
.cursor/rules/*.mdc(frontmatter-awarealwaysApplyfiltering is not yet implemented, so the total is a ceiling). - Per-mode files (
modes/<name>.md) load only when that mode runs. - Per-agent files (
.claude/agents/,.opencode/agents/,iso/agents/) load only when the orchestrator dispatches.
iso/instructions.md is authoring source — it compiles to the tool-specific files at build time. When a tool's file isn't in the repo, iso stands in as the compiled-content equivalent so each tool's cost reflects what you actually pay at runtime.
For shared-prefix files, the text report breaks down the biggest sections so you can see exactly where the budget is going, then shows the per-tool totals:
Shared-prefix files (section breakdown)
~8,072 tokens iso/instructions.md 4,679 words
~920 § Hard Limits — NEVER exceed these numbers 549w
~615 § Subagent Routing — which agent for which task 311w
~578 § Validation State Lags Behind Actual Field State 337w
~4,087 tokens modes/_shared.md 2,184 words
Per-tool always-loaded cost (pick the tool you actually run)
Claude Code (CLAUDE.md) ~8,072 tokens / turn
~8,072 iso/instructions.md
note: CLAUDE.md not in repo; iso/instructions.md stands in as the compiled content.
AGENTS.md convention (opencode / Codex / Zed) ~12,159 tokens / turn
~8,072 iso/instructions.md
~4,087 modes/_shared.md
note: AGENTS.md not in repo; iso/instructions.md stands in as the compiled content.
Cursor (.cursor/rules) ~8,072 tokens / turn
~8,072 iso/instructions.md
note: .cursor/rules/ not in repo; iso/instructions.md stands in as the compiled content.
Worst-case tool: AGENTS.md convention at ~12,159 tokens / turn
Per-mode context (loads when that mode runs)
~4,549 tokens modes/apply.md 2,630 words
...
Worst case (worst tool + heaviest mode): ~16,708 tokens / turn
Token estimates use chars ÷ 4 — the same heuristic
perf-shared-prefix-budget uses, so the numbers line up with the
findings. Actual per-provider cost depends on the tokenizer.
CI guard. Put this in a workflow step to catch regressions when someone adds a 500-word "just one more thing" to a shared file:
- run: npx @agent-pattern-labs/isolint cost . --budget 10000--budget exits non-zero (exit 1) when the worst-case tool total
exceeds N tokens — i.e. the most expensive tool group's bundle, not
a naive sum across tools.
Other flags: --no-sections hides the per-section breakdown,
--no-gitignore disables the git allowlist (by default
cost and lint both skip files git ignores — generated build
output like CLAUDE.md from the agentmd / iso tooling isn't
counted).
agentmd is a structured-markdown
dialect for authoring LLM agent prompts: a # Agent: <name> H1,
explicit ## Hard limits / ## Defaults sections, and rule items
shaped as - [H1] claim with an indented why: rationale underneath.
The dialect compiles down to the plain AGENTS.md / CLAUDE.md /
Cursor rules that tools actually load.
isolint supports agentmd natively:
- Auto-detected. A file with a top-level
# Agent: <name>heading is treated as agentmd. No config flag needed. - Rationale stays. In plain harness prose,
Why:paragraphs in a shared-prefix file are overhead —perf-rationale-in-shared-prefixflags them. In agentmd the rationale is load-bearing (the model useswhy:to judge edge cases), so that rule skips agentmd files. You can keep richwhy:on every rule without fighting the linter. - Everything else applies. Every other rule —
soft-imperative,taste-word,trailing-etc,long-sentence, the cross-file duplication and emphasis-inflation checks, all of it — runs normally on agentmd files. The dialect changes rationale semantics, not prose-quality expectations. - Mixed harnesses work. One lint run can have plain-prose mode files alongside agentmd-authored agents; each file is judged by the dialect it's actually in.
- Generated outputs are skipped (with
.gitignorehonored by default). If you author with agentmd and compile toAGENTS.md+.cursor/rules/+.opencode/agents/, only the source file is linted — not the N generated copies. That keeps findings in the file you actually edit.
If you're using agentmd and the linter is flagging your why:
paragraphs, the most likely cause is that the H1 isn't
# Agent: <name> (the detector looks for that exact shape). Rename
the H1 and isolint will recognise the dialect on the next run.
For JSON Plan files, use isolint validate --perf instead of the markdown
linter. That path runs plan-specific performance checks such as:
- repeated or restated steps
- instructions that duplicate
expected_output - schema details repeated in prose
- structured outputs that also ask for explanation
- tone/style guidance on structured outputs
- long low-signal step instructions
isolint plan uses the same checks during generation. If the model emits a
schema-valid plan with plan-performance findings, the planner retries with
the formatted findings as repair feedback until the plan is clean or it runs
out of attempts.
Rules never fire inside these spans:
| Span | Default | Why |
|---|---|---|
| Fenced code blocks | on | Example input/output, not prose |
Inline code (`word`) |
on | The word is being named |
| HTML comments | on | Author notes |
| Short double-quoted phrases (≤ 40 chars) | on | Avoid "leveraged", "cutting-edge" — words being named, not used. Full-sentence quoted directives (>40 chars) still lint. |
| YAML/TOML frontmatter | on | Opencode modes, Claude Code agents, Cursor .mdc rules — structured metadata, not prose instructions |
> blockquotes |
on | Usually example input/output or quoted prose, not harness instructions |
soft-imperative additionally skips findings inside questions (sentence
ending in ?) — What story should they tell? is not an instruction.
Disable any span via skip_spans in config.
Every finding is tagged with its enclosing ## Heading. You can mute or
downgrade rules inside specific sections:
{
"section_severity": {
"Examples": "off",
"Notes": "info",
"Changelog": "off"
}
}Keys are case-insensitive. Useful for docs-heavy harness files where
## Examples contains intentional bad prose. The text reporter groups
findings by section so CI logs stay navigable.
Every finding carries its containing sentence — the tokenizer correctly handles abbreviations, filenames, decimals, URLs, and ellipses. This powers three things:
- Text reports show
in: "<sentence>"so you can act without opening the file. --fix --llmcoalesces multiple findings in the same sentence into one rewrite — no more conflicting edits.- JSON/SARIF expose
sentenceandsectionfields so downstream tools (Claude Code, review bots) have the full context.
Three rules look beyond a single line or file:
undefined-step-referenceaggregates every## Step N/## Block Xheading across every file in the lint set. A reference is flagged only if it isn't defined anywhere in the scanned harness. Multi-file harnesses that keep shared Blocks in_shared.mdwork out of the box.missing-file-referencereads the repo file list and flags any filename in prose (cv.md,profile.json,schema.yaml) that doesn't exist. Basename-matched, so references tocv.mdfrommodes/apply.mdresolve againstdata/candidates/cv.md.context-budgetcounts prose words (excluding code fences and frontmatter) per harness file. Override thresholds via"context-budget.info_words"/"context-budget.warn_words".
The simulator is deterministic. To see how the rewrites move the needle
on your actual target model, use verify:
npx @agent-pattern-labs/isolint verify \
--harness modes/classify.md \
--input tests/sample.json \
--small mistralai/mistral-7b-instruct \
--large anthropic/claude-3.5-sonnetverify runs the original harness through the small model, applies
--fix --llm rewrites via the large model, runs the fixed harness through
the small model again, and reports: simulator fragility before/after,
harness size before/after (chars, words, approximate prompt tokens,
sentences, performance findings), char-delta in output, JSON-validity
before/after, and the raw model outputs for side-by-side inspection.
First time the "small models don't break" claim can be validated
empirically on the actual target.
The tagline is "rewrites harnesses so small models don't break." A
simulator codifies known 7B-class failure modes (drops should, invents
items past etc., skips when relevant conditionals, loses the middle
of long sentences, hallucinates dangling $input.X refs) and produces a
fragility score — the fraction of instructions a weak model would
break on.
The regression-sim test suite asserts that for every fixture, running
isolint --fix strictly decreases the fragility score. Measured on
the built-in fixtures:
| Fixture | Before fix | After fix |
|---|---|---|
classify |
0.75 (3 fail / 1 follow) | 0.00 (0 fail / 4 follow) |
multi-step |
0.75 (3 fail / 1 follow) | 0.25 (1 fail / 3 follow) |
CI fails if a rule change regresses either fixture. For the first time, the "so small models don't break" claim is a number, not an assertion.
Every LLM rewrite is re-linted before being applied. A rewrite is accepted only if:
- The rule that triggered the fix no longer fires on the new text.
- No new rules fire on the new text.
- Markdown structure is preserved (heading, list, code-fence, inline-code, bold, and link counts all match).
If a rewrite fails validation, the linter retries once with explicit
feedback about what went wrong. If the retry also fails, the fix is
skipped and reported in the fix summary — no mangled prose ever lands
on disk. This is what makes --fix --llm safe to run in CI.
Three mechanisms work together to improve rewrite quality:
- Few-shot examples per rule — each rule ships canonical
bad → goodpairs that are included in the prompt when that rule fires. Grounds the model in the intended fix direction instead of hoping it guesses. Every example is also a self-test: CI asserts thatbadtriggers the rule andgooddoesn't. - Self-consistency sampling — 3 candidates are generated in parallel
per attempt; the validator scores each; the lowest-problem candidate
wins. Tunable via
samples_per_attempt(default 3; set to 1 to disable). - Feedback-driven retry — a rejected rewrite is fed back to the model with its specific validation problems ("rewrite still violates X"; "markdown structure changed"). The retry is targeted, not a cold reroll.
isolint lint --fix --llm --stats prints a per-rule table of rewrite
outcomes — candidates, first-try accepts, accepts after retry, rejects,
and an overall accept percentage. Low accept rates are a signal that the
rule's examples or message need work.
Run against JobForge's modes/ directory (19 files, 1900+ lines of
opencode harness prose):
$ npx @agent-pattern-labs/isolint lint /Users/you/JobForge/modes
...
19 files scanned — 54 warnings, 6 infos
Every finding points at a concrete phrase that a 7B model will handle
worse than a Claude-class model. --fix --llm rewrites each one
while preserving intent and markdown formatting.
npx tsx examples/lint-demo/run.tsRuns lint + fix end-to-end against a sample file with a mock rewriter.
A Plan is a pure data object (see src/schema/plan.ts):
input_schema— JSON Schema for the user input.steps[]— ordered, atomic steps. Each step has:instruction— a single imperative task.inputs— explicit references:$inputor$steps.<id>.constraints— machine-checkable rules (length, regex, enum).expected_output—text,enum,json+schema, orlist.failure_handling—retry|repair|fallback|fail.
final_output— which step (or composition of steps) is the result.
Every plan is validated against src/schema/plan.ts before execution.
Every step output is validated against its expected_output.
Nothing is trusted.
For each step, the runtime:
- Resolves
inputs($input, prior step outputs). - Builds a minimal prompt — instruction, labeled inputs, constraints, explicit output format. No persona preamble, no hidden reasoning.
- Calls the small model.
- Validates the output:
- Output kind (text/enum/json/list) + schema.
- Constraints (
must_match/must_not_matchregex).
- On failure, follows the step's
failure_handling:retry— same prompt.repair— append validator errors, ask for a fix.fallback— return a deterministic value.fail— abort the plan.
If you just want the CLI, run it directly with npx:
npx @agent-pattern-labs/isolint lint ./modesOr add it to a project:
npm install -D @agent-pattern-labs/isolintIf you're developing this repo itself:
npm install
npm run buildRequires Node ≥ 18.17.
Copy the checked-in example env file if you want local defaults:
cp .env.example .envPreferred env vars:
OPENROUTER_API_KEY=sk-or-...
ISOLINT_LARGE=anthropic/claude-3.5-sonnet
ISOLINT_SMALL=mistralai/mistral-7b-instructLegacy ISOMODEL_LARGE / ISOMODEL_SMALL are still accepted for
compatibility.
Force provider selection with --provider openrouter|openai|ollama|custom.
OpenRouter and OpenAI work out of the box. Groq, Together, vLLM, and other
OpenAI-compatible endpoints work via --provider custom --base-url.
lint and verify are covered above because they are the main linter
workflows. The commands below are for the plan/runtime pipeline.
Ask the large model to generate a Plan for a new task.
npx @agent-pattern-labs/isolint plan \
--task "Extract purchase orders from emails into {po_number, vendor, total}" \
--out plans/po-extract.jsonAfter writing the file, plan also runs the advisory plan-performance
checks and prints any overhead findings immediately.
Run an existing Plan through the small model.
npx @agent-pattern-labs/isolint run \
--plan examples/multi-step-reasoning/plan.json \
--input examples/multi-step-reasoning/input.jsonPipe stdin:
cat ticket.json | npx @agent-pattern-labs/isolint run --plan plan.json --input -Schema-check a plan without running it.
npx @agent-pattern-labs/isolint validate --plan examples/cold-email/plan.jsonRun the schema check plus advisory performance analysis:
npx @agent-pattern-labs/isolint validate \
--plan examples/cold-email/plan.json \
--perfCI / local gate for bundled examples:
npm run check:plansThree ready-to-run plans live in examples/:
| Example | What it shows |
|---|---|
cold-email/ |
Structured paragraphs with regex constraints (subject/hook/value/ask) |
data-extraction/ |
Strict-schema JSON extraction + numeric computation over prior steps |
multi-step-reasoning/ |
Support-ticket triage: classify → severity → root cause → next action |
npx tsx examples/offline-demo/run.tsRuns the multi-step-reasoning plan end-to-end against a mock small model.
Proves the full loop: input validation → per-step execution → schema
validation → final output composition.
npx @agent-pattern-labs/isolint run \
--plan examples/data-extraction/plan.json \
--input examples/data-extraction/input.json \
--small mistralai/mistral-7b-instructimport { Planner, Runtime, createProvider } from "@agent-pattern-labs/isolint";
const plan = await new Planner(
createProvider({ model: "anthropic/claude-3.5-sonnet" }),
).generate({ task: "Classify tickets and propose a next action" });
const result = await new Runtime(
createProvider({ model: "mistralai/mistral-7b-instruct" }),
).run(plan.plan, { ticket: { subject: "...", body: "..." } });
console.log(result.final);- No long runtime prompts. All complexity lives in the Plan.
- No vague instructions. The planner system prompt forbids "be creative", "engaging", "appropriate", etc.
- No world-knowledge dependence. Every step must cite
$inputor a prior step as its fact source. - No taste-based validation. Constraints must be regex, length, enum, or schema.
- Deterministic by default. Runtime temperature defaults to
0.
npm testThe suite covers lint rules, suppressions, fix engine, diff output, schema validation, JSON extraction, runtime retry/repair, fallback handling, and input-schema enforcement.
MIT