Methodology comparison: current ADR-driven workflow vs. GitHub Spec-Kit (SDD) vs. BMAD Method #51

ma3u · 2026-05-16T22:28:03Z

ma3u
May 16, 2026
Maintainer

Why this discussion

We currently run a spec-/ADR-driven workflow (Knative/KEP-style governance):

25 ADRs in docs/ADRs/ — the architectural decision log
A monolithic planning document (docs/planning-health-dataspace-v2.md) — 27+ phases, ~69 ADR cross-references
GitHub Issues — per-task tracking, linked to ADRs/phases
Full Claude Code config — .claude/agents/ (4 subagents: architect, compliance-reviewer, security, tester), .claude/commands/ (3 slash commands), .claude/rules/ (api-conventions, code-style, testing), .claude/skills/, plus CLAUDE.md

This issue evaluates whether to adopt — wholly or partially — one of two popular agentic methodologies:

GitHub Spec-Kit / Spec-Driven Development (SDD) — https://github.com/github/spec-kit
BMAD Method — https://github.com/bmad-code-org/BMAD-METHOD

The three approaches in brief

1. Current — ADR-native / KEP-style

Decisions captured as ADRs; a planning doc as roadmap; issues as the task queue; Claude behaviour shaped by hand-written CLAUDE.md + rules + scoped subagents. No formal per-feature spec → plan → tasks loop — features go straight from an issue/ADR to implementation.

2. GitHub Spec-Kit (SDD)

Inverts the spec → "specs become executable blueprints." Seven-step loop, each step a slash command:

constitution → specify → clarify → plan → tasks → analyze → implement

Artifacts land in .specify/<feature-id>/ (spec, research, plan, contracts, task list). Installs native slash commands (/speckit.specify, /speckit.plan, …) into the agent's command directory. Agent-agnostic (30+ agents incl. Claude Code).

3. BMAD Method

Agentic agile framework: 12+ specialist agent personas (PM, Architect, Dev, UX, QA…) that collaborate, sometimes in "Party Mode." Scale-adaptive — adjusts planning ceremony from bug-fix to enterprise. 34+ workflows. Own installer (npx bmad-method install); needs Node 20+, Python 3.10+, uv. Ships .claude-plugin/ integration.

Side-by-side

Dimension	Current (ADR-native)	Spec-Kit / SDD	BMAD Method
Unit of planning	ADR + planning-doc phase	Per-feature spec folder	Per-project, scale-adaptive
Decision log	✅ 25 ADRs (mature)	⚠️ "constitution" only — no decision history	⚠️ design docs, no ADR equivalent
Per-feature rigor	❌ informal	✅ spec→clarify→plan→tasks	✅ workflow-driven
Claude Code integration	Hand-rolled `CLAUDE.md` + rules	✅ native slash commands, drop-in	⚠️ own plugin + installer + extra toolchain
Setup friction	none (already in place)	low (`specify init`, slash cmds)	high (Node+Python+uv, installer)
Agent model	4 scoped subagents	none (workflow-based)	12+ personas (overlaps our subagents)
Best fit	mature projects, audit trail	greenfield features	greenfield projects
Maturity tax on us	already paid	additive, opt-in	rip-and-replace-ish

Where each shines / hurts for this project

This is a mature codebase — Phase 27, 25 ADRs, 5,300+ graph nodes, deployed to Azure. Both alternatives are designed to add the most value at project inception, which we are well past.

Spec-Kit

➕ The specify → plan → tasks loop fills our real gap: we have no formalised per-feature spec/plan step. It would have caught the "0 Participant Profiles" issue earlier (spec would have stated the Neo4j-fallback contract).
➕ Slash commands drop straight into .claude/commands/ — same mechanism we already use.
➕ Complements ADRs rather than replacing them: ADR = why, spec = what, plan = how.
➖ Its "constitution" duplicates what CLAUDE.md + .claude/rules/ already do.
➖ Ceremony is heavy for one-line fixes (no built-in scale-down).

BMAD

➕ Scale-adaptive ceremony is genuinely nice — formal for big features, light for fixes.
➕ Strong agile structure if the team grows.
➖ 12+ personas overlap and would compete with our 4 already-tuned subagents.
➖ Extra runtime toolchain (Node/Python/uv) + own installer = more moving parts on a project that already juggles 19 services.
➖ More of an all-in framework commitment; harder to adopt partially.

Recommendation

Adopt Spec-Kit's per-feature loop selectively; keep ADRs; do not adopt BMAD wholesale.

Keep ADRs as the architectural decision log — neither alternative has a real equivalent, and 25 ADRs of audit trail is an asset, not debt (especially for an EHDS compliance demo).
Add Spec-Kit's specify → plan → tasks loop for net-new features (e.g. the open issues [FEATURE] Cross-Participant Dataset Discovery - "Find all diabetes datasets across German hospitals" #8, DCP BusinessWallet + EUDI Wallet Sandbox: hybrid patient/B2B credential stack (Option B) #22, GesundheitsID OIDC RP via gematik open-source stack (Option C) #24). It plugs the one genuine gap in our workflow and integrates as plain slash commands.
Map, don't duplicate: ADR = decision rationale, Spec-Kit spec = feature requirements, Spec-Kit plan = implementation design, GitHub issue = tracking. Skip Spec-Kit's "constitution" — CLAUDE.md already is ours.
Borrow BMAD's scale-adaptive idea without the framework: a one-line "skip spec/plan for trivial fixes" rule in CLAUDE.md.
Keep our 4 subagents — they are scoped to this project; BMAD's generic personas would regress that.

Easiest to use in Claude Code?

Spec-Kit, clearly. It installs as native slash commands into the same .claude/commands/ directory we already use, is agent-agnostic, and needs no extra runtime. BMAD brings its own plugin system, installer, and Node+Python+uv toolchain — more friction, more overlap with what we already have tuned.

Open questions

Pilot Spec-Kit on one open feature issue ([FEATURE] Cross-Participant Dataset Discovery - "Find all diabetes datasets across German hospitals" #8 cross-participant discovery is a good candidate) and review the artifact overhead before committing?
If we adopt Spec-Kit, do specs live in .specify/ or fold into docs/?
Should ADR creation become a Spec-Kit /speckit.analyze-style gate, or stay manual?
Anyone want to argue the BMAD case more strongly — particularly the multi-persona collaboration?

Comparison drafted with Claude Code; figures from each project's README as of 2026-05.

ma3u · 2026-05-16T22:41:42Z

ma3u
May 16, 2026
Maintainer Author

Deep dive: token usage of the 3 methods

Token cost is not one number — it splits into four buckets that behave very differently:

Static / always-on — injected into every prompt; cacheable.
Per-feature working set — files pulled in while a feature is in flight.
Generated artifacts (output tokens) — text the model writes (output tokens are ~4–5× the price of input).
Cache behaviour — how often the cacheable prefix is invalidated.

All figures below at ~4 chars/token. The current numbers are measured from this repo; Spec-Kit and BMAD are estimated from each method's structure (no install measured).

1. Static / always-on footprint

Current — measured, loaded every turn:

File	Tokens
`CLAUDE.md` (project)	~1,832
`.claude/rules/api-conventions.md`	~1,972
`.claude/rules/code-style.md`	~976
`.claude/rules/testing.md`	~1,050
Project static subtotal	~5,830
Global SuperClaude `~/.claude/` chain*	~24,055
Effective always-on total	~29,900

Note: the SuperClaude global chain (PERSONAS.md ~5.2K, ORCHESTRATOR.md ~5.7K, MODES.md ~3.5K, MCP.md ~2.9K …) is user-level config and applies identically to all 3 methods — it is not a differentiator. The method-attributable static cost today is the ~5.8K of project files.

Spec-Kit: ~0 added always-on. Its "constitution" (~1–2K) is referenced by commands, not globally injected. Slash-command templates load only on invocation. Static delta ≈ 0.
BMAD: small always-on if it installs an orchestrator/router via .claude-plugin/ (~1–3K); the 12+ personas are on-demand. Static delta ≈ 1–3K.

➡️ On always-on cost the three are roughly equal. Spec-Kit is marginally the leanest.

2. Per-feature working set

Current — measured unit costs:

On-demand load	Tokens
Subagent (`architect`/`security`/…)	~600–1,070
Slash command (`fix-issue`/`review`/…)	~566–650
One ADR (avg of 25)	~1,574
`docs/planning-health-dataspace-v2.md` read whole	~72,279 ⚠️

Realistic current per-feature method overhead: 1 subagent (~900) + 1–2 ADRs (~3K) + 1 issue (~0.5K) ≈ 4–6K. The dominant risk is the 72K monolithic planning doc — a single full read of it costs more than an entire Spec-Kit or BMAD feature cycle. CLAUDE.md tells Claude to "consult the planning document"; whether that's a grep or a full read swings the bill by ~70K.

Spec-Kit: the specify→clarify→plan→tasks→implement loop creates and re-reads artifacts. Command template per step ~1–2K; feature artifacts — spec ~1.5–3K, plan ~2–5K, tasks ~1.5–3K, research ~1–3K. During the implement phase the resident working set is ~6–14K, plus ~8–15K of template overhead spread across the loop's turns. Per-feature input overhead ≈ 15–30K.
BMAD: the persona model front-loads heavily. One agent activation = persona (2–6K) + its dependency files (templates/checklists/tasks, 3–10K) = ~5–16K per agent. A full feature touching PM→Architect→Dev→QA = 4 activations ≈ 20–50K. "Party Mode" holds 3–4 personas resident at once = 15–30K+ concurrently. Per-feature input overhead ≈ 25–55K — the heaviest.

3. Generated artifacts (output tokens — the expensive kind)

Current: ADRs are largely hand-written/lightly assisted; little AI-generated planning prose. Output stays close to the code diff.
Spec-Kit: specify+clarify+plan+tasks generate ~8–20K output tokens of artifacts before a line of code. That is the methodology's core cost — and its core value.
BMAD: design docs, architecture specs, user stories across personas → ~10–25K output per feature, similar-to-higher than Spec-Kit.

Output tokens dominate cost, so this bucket matters most: Current ≪ Spec-Kit ≲ BMAD.

4. Cache behaviour

Current: the ~5.8K project static block is stable → strong prefix cache hits. ADRs are read mid-conversation, after the cacheable prefix, so they re-cost only until the next cache write.
Spec-Kit: command templates are stable; per-feature artifacts change → moderate cache efficiency. The loop is linear, so the prefix grows monotonically — cache-friendly within a feature.
BMAD: persona switching rewrites large context blocks mid-conversation → frequent cache-prefix invalidation → worst cache economy. Party Mode is the worst case: the resident set churns as agents take turns.

Bottom line

Bucket	Current	Spec-Kit	BMAD
Static always-on (method-attributable)	~5.8K	~0	~1–3K
Per-feature input overhead	~4–6K*	~15–30K	~25–55K
Generated output per feature	low	~8–20K	~10–25K
Cache efficiency	best	good	weakest
Relative per-feature token cost	1×	~3–5×	~5–9×

* excludes the 72K planning-doc full-read risk.

Takeaways

Current is the cheapest per feature — by a wide margin — but carries one latent landmine: the 72K planning doc. Splitting it into per-phase files (or enforcing grep-only access) would remove the single biggest token risk we have, independent of any methodology choice. Recommend doing this regardless.
Spec-Kit costs ~3–5× per feature — that delta is the spec/plan/tasks artifacts. It is a deliberate trade: tokens spent up front to reduce rework tokens later. Reasonable for net-new features; wasteful for one-line fixes (it has no built-in scale-down — pair it with a "skip for trivial fixes" rule).
BMAD is the most token-hungry — ~5–9× — driven by the persona+dependency model and cache-hostile persona switching. Party Mode amplifies this further.

This reinforces the main recommendation: adopt Spec-Kit's loop selectively for substantial net-new features (pay the 3–5× only where the planning rigor earns it), keep the lightweight ADR path for everything else, and avoid BMAD's always-on persona overhead.

Current figures measured from this repo on 2026-05-17; Spec-Kit/BMAD figures estimated from each method's documented structure.

0 replies

ma3u · 2026-05-16T22:54:22Z

ma3u
May 16, 2026
Maintainer Author

Re-evaluation after splitting the 72K planning doc

Done — docs/planning-health-dataspace-v2.md has been split. Re-running the token math below.

What changed (measured)

Artifact	Before	After
`planning-health-dataspace-v2.md`	~72,279 tok (one monolith)	~7,708 tok (slim index)
`docs/planning/roadmap-phases-01-10.md`	—	~17,489 tok
`docs/planning/roadmap-phases-11-20.md`	—	~18,383 tok
`docs/planning/roadmap-phases-21-24.md`	—	~14,100 tok
`docs/planning/cross-cutting-and-architecture.md`	—	~11,143 tok

The index keeps what's always relevant — the GitHub Issues table, the phase-status summary, and the ADR index (25 ADRs). Per-phase narrative detail moved to four archive files, each independently readable. (The ~3.5K net shrink is the deleted 200-line in-doc TOC.)

Re-evaluated per-feature token cost — "Current" method

Scenario	Before split	After split
Typical feature (read planning context + 1–2 ADRs + subagent + issue)	4–6K if Claude greps; ~72K if it reads the doc whole	~8K index + ~3K ADRs + ~0.9K subagent + ~0.5K issue ≈ ~12K
Worst case (needs deep phase detail)	~72K (whole doc)	~8K index + one ~18K archive ≈ ~26K
Predictability	❌ swings 4K–72K depending on how Claude reads	✅ bounded ~8–26K

The landmine is defused. The catastrophic 72K single-read is gone: worst case is now ~26K, typical ~12K. More importantly the cost is now predictable — the index is small enough (~8K) that reading it whole every feature is the sensible default, and CLAUDE.md now explicitly tells Claude not to read the archives unless a specific phase's detail is needed.

Updated comparison table

Bucket	Current (post-split)	Spec-Kit	BMAD
Static always-on (method-attributable)	~5.8K	~0	~1–3K
Per-feature input overhead	~12K typical, ~26K worst (was 4–72K)	~15–30K	~25–55K
Generated output per feature	low	~8–20K	~10–25K
Cache efficiency	best	good	weakest
Relative per-feature token cost	1×	~1.5–2.5×	~3–4.5×
Predictability	✅ bounded (was ❌)	✅	⚠️ persona churn

Note the multipliers compressed (Spec-Kit was ~3–5×, BMAD ~5–9×) — not because those methods got cheaper, but because the Current baseline is now an honest, stable ~12K instead of an artificially-low 4–6K that ignored the 72K risk. This is the apples-to-apples number.

Revised conclusion

The original recommendation stands and is now stronger:

The single concrete argument against the Current method — the unbounded planning-doc read — no longer exists. Current is now the clear, predictable, cheapest default.
Spec-Kit's ~1.5–2.5× per-feature premium still buys real spec/plan/tasks rigor — worth it for substantial net-new features (issues [FEATURE] Cross-Participant Dataset Discovery - "Find all diabetes datasets across German hospitals" #8, DCP BusinessWallet + EUDI Wallet Sandbox: hybrid patient/B2B credential stack (Option B) #22, GesundheitsID OIDC RP via gematik open-source stack (Option C) #24), skippable for small fixes.
BMAD's ~3–4.5× and cache-hostile persona churn remain the weakest fit.

Recommendation unchanged: keep the ADR-native workflow as the default (now genuinely lightweight and predictable), layer Spec-Kit's loop selectively on big features, and apply this same split discipline to any future doc that grows past ~15K tokens.

Measured from the repo on 2026-05-17 after the split commit.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Methodology comparison: current ADR-driven workflow vs. GitHub Spec-Kit (SDD) vs. BMAD Method #51

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Methodology comparison: current ADR-driven workflow vs. GitHub Spec-Kit (SDD) vs. BMAD Method #51

Uh oh!

ma3u May 16, 2026 Maintainer

Why this discussion

The three approaches in brief

1. Current — ADR-native / KEP-style

2. GitHub Spec-Kit (SDD)

3. BMAD Method

Side-by-side

Where each shines / hurts for this project

Recommendation

Easiest to use in Claude Code?

Open questions

Replies: 2 comments

Uh oh!

ma3u May 16, 2026 Maintainer Author

Deep dive: token usage of the 3 methods

1. Static / always-on footprint

2. Per-feature working set

3. Generated artifacts (output tokens — the expensive kind)

4. Cache behaviour

Bottom line

Uh oh!

ma3u May 16, 2026 Maintainer Author

Re-evaluation after splitting the 72K planning doc

What changed (measured)

Re-evaluated per-feature token cost — "Current" method

Updated comparison table

Revised conclusion

ma3u
May 16, 2026
Maintainer

ma3u
May 16, 2026
Maintainer Author

ma3u
May 16, 2026
Maintainer Author