Skip to content

Pipeline-as-Config: structural model dispatch (#2114)#2210

Draft
justinchuby wants to merge 13 commits into
microsoft:mainfrom
justinchuby:squad/2114-pipeline-as-config
Draft

Pipeline-as-Config: structural model dispatch (#2114)#2210
justinchuby wants to merge 13 commits into
microsoft:mainfrom
justinchuby:squad/2114-pipeline-as-config

Conversation

@justinchuby

Copy link
Copy Markdown
Contributor

Summary

Implements the Pipeline-as-Config redesign from #2114 (PR1–5), by generalizing the existing DecoderOnlyPipeline executor rather than introducing the greenfield classes the issue sketched — "refactor not rewrite." The proposed PipelineExecutor/MultiSessionPipeline already effectively exist as DecoderOnlyPipelineState/DecoderOnlyPipelineModel.

Addresses #2114.

What's included

PR Change
PR1 Config v2 schema: Config::version + Config::Pipeline; SAX v2 parse; TranslateV1ToPipeline / LowerPipelineToModel; pipeline_presets.*
PR2 Structural CreatePipeline() delegated from CreateModel when a pipeline is present; legacy model.type dispatch preserved as the ClassifyLegacyRoute oracle; v2 tokens/generation/metadata lowering
PR3 PipelineFlow (init/step/final phases, explicit dataflow[], DFS cycle detection, 10-stage guard) + Finalize hook
PR4 Plugin escape hatch: opaque C-ABI plugin_api.h + plugin_loader, gated behind USE_GENAI_PLUGINS (OFF by default)
PR5 ClassifyStructuralRoute replaces model.type dispatch for CP2/CP3/CP4, guarded by a zero-regression gate asserting structural == legacy for every in-tree fixture

Backward compatibility

  • model_type.h predicates are retained — every one still has a live non-dispatch caller (v1→v2 translator, generators, kv_cache, RNNT context-length guard, legacy oracle). Only the dispatch role was removed. This is shrink-in-role, not the issue's "delete the file."
  • All checked-in genai_config.json fixtures route identically (PipelineDispatchTests gate).
  • The new qwen2-5-vl-pipeline gate fixture surfaced and fixed a real divergence: has_vision now recognizes the vision.pipeline[] shape (not just vision.filename).

Testing

  • unit_tests: 76 passed / 0 failed / 21 skipped (skips are env-gated: TensorRT-RTX, StreamingASR, Parakeet).
  • New suites green: PipelineConfigTests, PipelineFlowTests, PluginLoaderTests, PipelineDispatchTests.
  • gpt2-fp32 + lfm2-fp32 e2e (CPU + CUDA) pass.
  • Plugin ON-path (USE_GENAI_PLUGINS=1) compiles.

Not yet exercised end-to-end (reviewer attention welcome)

  • Multi-stage VLM RunStage init/step loop and Finalize final-stage path — no in-tree generic decoder.pipeline[] VLM fixture with ONNX weights; behavior-preservation rests on the dispatch-equivalence gate + reasoning, not a running multi-session VLM.
  • Downloaded-model Python tests and TensorRT/StreamingASR paths (environment-gated).

Descoped to v2.1+ (per maintainer comments in #2114)

TTS single_pass, diffusion denoising, RNNT loop strategies as plugin, when: "final" vocoder, repeat/counter. Encoder-decoder remains Whisper/Marian (routed structurally, not unified into the executor).

🤖 Draft — opening for early architectural review given the e2e caveats above.

justinchuby and others added 13 commits June 9, 2026 17:06
Logits::Get() only converted the model's raw logits to float32 when the
output type was Float16. For BFloat16 the conversion was skipped and the
subsequent WrapTensor<float> reinterpreted the raw 2-byte bf16 values as
4-byte float32, corrupting every logit (wrong argmax, incoherent
generation). The identical model in Float16 worked correctly.

Treat BFloat16 the same as Float16 in both the fp32 staging-buffer
allocation and the Cast to float32. Add an on-device CUDA bf16->f32 cast
(LaunchBf16ToFp32) so the conversion does not fall back to a host
round-trip; the CPU Cast path already supported bf16->f32.

Verified on a bf16 decoder (vocab 262144): first-token argmax now matches
the Float16 / HuggingFace reference and generation is identical to fp16.

Fixes microsoft#2202

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>
Implements the Pipeline-as-Config redesign (issue microsoft#2114, PR1-5) by
generalizing the existing DecoderOnlyPipeline executor rather than
introducing new classes ("refactor not rewrite").

- Config v2 schema: Config::version + Config::Pipeline; SAX v2 parse;
  TranslateV1ToPipeline / LowerPipelineToModel; pipeline_presets.
- Structural CreatePipeline() delegated from CreateModel when a pipeline
  is present; legacy model.type dispatch preserved as ClassifyLegacyRoute
  oracle.
- PipelineFlow (init/step/final phases, explicit dataflow, DFS cycle and
  10-stage guards) + Finalize hook.
- Plugin escape hatch: opaque C-ABI plugin_api.h + plugin_loader, gated
  behind USE_GENAI_PLUGINS (OFF by default).
- ClassifyStructuralRoute replaces model.type dispatch for CP2/CP3/CP4,
  guarded by a zero-regression gate asserting structural == legacy for
  every in-tree fixture.

Backward compatibility: model_type.h predicates retained (live
non-dispatch callers); all 14 checked-in genai_config.json fixtures route
identically. The qwen2-5-vl-pipeline gate fixture surfaced and fixed a
real divergence: has_vision now also recognizes vision.pipeline[].

Tests: unit_tests 76 passed / 0 failed / 21 skipped (env-gated); new
PipelineConfig/Flow/Dispatch/PluginLoader suites green; gpt2 + lfm2 e2e
pass. WIP src/models/kv_cache.cpp intentionally excluded.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds examples/pipeline-config/ demonstrating the v2 schema, each verified
to parse, lower, and route against the current build:

- 01-preset-decoder       preset usage (autoregressive-decoder) -> Gpt/DecoderOnly
- 02-explicit-encoder-decoder  explicit multi-stage dataflow (init/step,
                          cross_attention_from, frozen cross_cache) -> Whisper
- 03-vlm-per-image        loop:"per_image" + mrope_3d vision pipeline -> MultiModal
- 04-plugin-escape-hatch  plugin opaque-handle shape (doc-only; needs
                          USE_GENAI_PLUGINS=ON)
- 05-v1-to-v2             legacy v1 gpt2 config beside its v2 equivalent
- README.md               walkthrough: version field, v1->v2 migration,
                          presets, flow phases + guardrails, plugin opt-in

Verification: test/pipeline_config_tests.cpp gains ExamplePipelineConfigs.*
(4 tests) that load examples 1/2/3/5 from EXAMPLES_PATH and assert parse +
lower + ClassifyStructuralRoute. 4/4 pass; PipelineConfig/Dispatch suites
11/11, no regressions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…oft#2114)

Extends examples/pipeline-config/ with two more verified, parseable demos:

- 06-multimodal-single-pass  gemma4-style tri-modal (text + vision + audio):
                             vision -> image_features, speech -> audio_features,
                             embedding merges both -> inputs_embeds, decoder
                             consumes. Single-pass (no per-image loop, no mRoPE).
                             Routes -> MultiModal.
- 07-prefill-decode          decoder.pipeline[] split into a prefill stage
                             (run_on_prompt:true/run_on_token_gen:false) and a
                             decode stage (run_on_prompt:false/run_on_token_gen:
                             true), sharing KV via past_present_share_buffer.
                             TranslateV1ToPipeline derives prefill->init,
                             decode->step. Routes -> DecoderOnlyPipeline.

README gains tri-modal vs per-image (06 vs 03) and prefill/decode sections.
ExamplePipelineConfigs gains MultiModalSinglePass and PrefillDecodeSplit,
asserting parse + lower + route + stage flags/sessions. Reviewed by Rusty
(APPROVE-WITH-NITS, no blocking issues); 17/17 example/config/dispatch tests
pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR-A of the v2.1 speculative-decoding groundwork. Adds the KV-cache
rollback prerequisite that speculative decoding's accept/reject step
will build on, without introducing any new schema.

- DecoderOnlyPipelineState::RewindTo override: drains outstanding async
  partial KV-cache updates, then rewinds position inputs, key-value
  cache, and recurrent state (mirrors DecoderOnly_State::RewindTo).
- generators.cpp: remove "decoder-pipeline" from the RewindToLength
  throw list; whisper/phi3v/lfm2 still throw.
- New CAPITests.RewindDecoderPipelineFp32CAPI with a tiny self-consistent
  causal decoder-pipeline fixture proving token-for-token identical
  continuation after RewindTo (KV + position state truly rolled back).

Build green; *Rewind* + model/CAPI/pipeline suites: 46 passed, 0 failed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…framework

Design doc for evolving the Pipeline-as-Config schema (v2.0) toward
native support for speculative decoding and modern inference
optimizations, without breaking v2.0 (both remain version: 2; the
speculative/strategy block presence is the discriminator).

Covers: speculative flow strategy, multi-session draft/target roles,
KV-cache rollback/checkpoint (PR-A, landed), variable tokens/step with
token-tree attention, intermediate hidden-state dataflow edges
(EAGLE/MTP), an ordered logit-processor/sampler chain, a runtime-vs-
build-time feature namespace, and a controller-plugin escape hatch.
Includes a dependency-ordered PR plan (PR-A -> PR-B -> {PR-C, PR-D};
PR-E/PR-F independent). Reviewed by Livingston (APPROVE-WITH-NITS);
citation nits addressed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…or (PR-B)

PR-B of the v2.1 speculative-decoding groundwork, stacked on PR-A's
KV-cache rollback. Adds the `speculative` flow strategy and multi-session
draft/target roles to the v2.1 schema, plus a working vanilla draft-target
executor. Both v2.0 and v2.1 stay version: 2; the strategy/roles block
presence is the discriminator.

- config.{h,cpp}: parse `roles` and `strategy` (speculative) blocks with
  nested draft/ngram/verify/tree, block-presence gated.
- src/models/speculative_decoder.{h,cpp}: SpeculativeDecoder composes a
  target and a draft Generator (each its own session + KV cache). Draft
  proposes K tokens greedily; target verifies all K in one forward pass;
  accept the longest matching prefix; commit target's greedy argmax (incl.
  bonus); roll back both roles via RewindToLength (PR-A). Output is
  token-for-token identical to plain greedy on the target.
- Logits::GetAll + State::GetRawLogits virtual + pipeline override +
  Generator::GetRawLogits expose full [batch,seq,vocab] logits for
  single-pass verify (all additive; normal decoding unperturbed).
- New SpeculativeDecodingTests (3) on real tiny target/draft fixtures:
  schema parse, greedy==baseline with a distinct draft (reject+rewind),
  and multi-token advance when draft==target.

Non-greedy acceptance, token-tree verify, and non-draft_model producers
(ngram/EAGLE) are parsed but throw — deferred to PR-C/PR-D.

Build green; SpeculativeDecodingTests + Pipeline + CAPI suites:
51 passed, 0 failed. Reviewed by Livingston (APPROVE-WITH-NITS).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR-C of the v2.1 speculative-decoding groundwork, stacked on PR-B.
Adds the EAGLE/MTP foundation (intermediate hidden-state edges) and
upgrades the PR-B tree stub to a verified linear-K fallback.

- Hidden-state edges (real, fully verified): State::GetHiddenStates
  virtual (additive, empty default) + DecoderOnlyPipelineState override
  reading the configured `decoder.outputs.hidden_states` intermediate
  activation from the ortvalue store, kept device-resident, cast
  fp16/bf16 -> fp32, normalized to [batch,seq,hidden]; Generator
  forwarder; schema parses the edge name and a dataflow wire
  (target.hidden_states -> eagle_draft.prev_hidden). This is the
  EAGLE/MTP prerequisite (draft consuming target hidden states).
- Token tree: medusa_choices now parsed; the executor degrades to a
  verified linear-K chain instead of throwing. Output stays greedy-
  equivalent (§10 invariant holds under tree verify).

True tree attention is deferred with a code-grounded reason: the
runtime PositionInputs only builds a 1D padding mask
(attention_mask_shape_ is {batch,seq}) and causal masking is hardcoded
in-graph via Trilu, so a per-(query,key) tree mask requires a model-side
[batch,1,q,kv] mask input (a build-time graph change). See design §11.

Build green; SpeculativeDecodingTests + Pipeline + CAPI suites:
57 passed, 0 failed. 6 new PR-C tests; PR-A/PR-B unaffected.
Reviewed by Livingston (APPROVE).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR-D of the v2.1 groundwork, stacked on PR-C. Generalizes the single
llguidance hook into a declarative, composable ordered chain of logit
processors applied before sampling, fully backward-compatible.

- Schema: search.logits_processors[] of typed ops (repetition_penalty,
  min_length, logit_bias, grammar, temperature, top_k, top_p, sample).
  Block-presence gated; version stays 2; unknown ops/keys throw.
- src/logits_processor_chain.{h,cpp}: LogitsProcessorOp interface +
  LogitsProcessorChain. repetition_penalty/min_length delegate to the
  existing Search scoring kernels; logit_bias is an in-place transform;
  grammar adapts the existing ConstrainedLogitsProcessor verbatim (incl
  Reset); temperature/top_k/top_p are realized by the existing fused
  sampler so numerics never diverge; sample is the terminal op.
- Back-compat: logits_chain_ is built ONLY when logits_processors is
  non-empty; otherwise the legacy guidance+sampling path runs byte-for-
  byte unchanged. Guarded by BackCompatDefaultMatchesLegacy.

Deferred (flagged): combine (contrastive/CFG) needs multi-session
logits; grammar e2e needs USE_GUIDANCE=ON (op throws clearly otherwise);
speculative-path integration; scalar sampler ops are realized by the
terminal fused sampler regardless of relative position.

Build green; LogitsChain + Sampling + Speculative + Pipeline + CAPI
suites: 77 passed, 0 failed. Reviewed by Livingston (APPROVE-WITH-NITS).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR-E of the v2.1 groundwork, stacked on PR-D. Adds the bucket-C escape
hatch: a plugin that drives a custom generation loop (e.g. Lookahead
Jacobi n-gram pools, nested cascades) that cannot be expressed as a
static DAG. The ABI exposes existing step primitives only; it adds no
new engine behavior.

- Schema: pipeline.controller {library, entry_point, config} (optional,
  block-presence gated; version stays 2; absent => no behavior change).
- C ABI (plugin_api.h): OgaDecodeController / OgaDecodeStepContext vtable
  exposing token append/get, forward step, logits (PR-B), hidden states
  (PR-C), rewind (PR-A), EOS/length queries.
- Host dispatch (controller_host.{h,cpp}, ungated): when a controller is
  configured, GenerateNextToken delegates to controller->Step(); the
  plugin calls back into the Generator's existing primitives via the
  vtable. controller_ is null otherwise; legacy path byte-for-byte
  unchanged.
- Loader: real dlopen is USE_GENAI_PLUGINS-gated; disabled builds throw a
  clear "rebuild with USE_GENAI_PLUGINS=ON" error (no silent skip).
- Fix: AppendAcceptedTokens commits via the SelectTop greedy path instead
  of Search::AppendTokens, which called ResetDone() and wrote one-past
  the sequence row at max_length (heap corruption). SelectTop guards with
  if(!done_) and preserves termination.

Real external .so load is build-gated (this build is USE_GENAI_PLUGINS=
OFF); the primitive surface is proven by an in-tree stub controller that
reproduces plain greedy token-for-token through the vtable only.

Build green; Controller + LogitsChain + Speculative + Pipeline + CAPI
suites: 68 passed, 0 failed (7/7 ControllerHookTests). PR-A/B/C/D green.
Reviewed by Livingston (APPROVE-WITH-NITS).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Final PR-F of the v2.1 groundwork, stacked on PR-E. Adds a clear
session_options namespace separating runtime-toggleable features from
build-time graph properties, following the "declared, never synthesized"
principle.

- Schema: session_options.runtime.* (kv_cache{dtype,quant}, paging,
  prefix_cache, sliding_window, chunked_prefill, precision) for features
  schedulable at load/session time with no graph change; and
  session_options.build_requires.* (attention, quantization, extra_heads)
  for properties baked into the exported ONNX graph. Both std::optional,
  block-presence gated, version stays 2.
- Validation (ValidateSessionOptionsFeatures): unknown enums throw, and
  cross-namespace misuse throws a clear namespaced error (e.g. a build
  quant token like awq in runtime.kv_cache.dtype points the user at
  build_requires.quantization, and vice versa).
- Back-compat: SessionOptions_Element previously had no OnObject (nested
  objects threw); adding runtime/build_requires is strictly additive --
  scalar keys still route to config_entries unchanged and every other
  nested object key still throws. Validator is a no-op when both absent.
- Plumbing point in CreateSessionOptionsFromConfig is a guarded warning
  only: NO runtime feature is applied and build_requires is never acted
  upon (declared, never synthesized). Per-feature numeric runtime effects
  (KV dtype/quant, paging, prefix cache, chunked prefill) are deferred
  per design section 9.

Build green; RuntimeFeatureNamespace + Config + Speculative + LogitsChain
+ ControllerHook + Pipeline + CAPI suites: 74 passed, 0 failed
(7/7 RuntimeFeatureNamespaceTests). PR-A..PR-E all green.
Reviewed by Livingston (APPROVE).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A concise, decision-oriented RFC complementing the detailed v2.1 design
(linked as the deep appendix). Frames the speculative-decoding +
inference-optimization work for a team meeting: TL;DR, motivation,
goals/non-goals, a capability-status table, six key decisions to discuss
(each with options, tradeoffs, and a recommendation), per-PR prototype
evidence with honest deferrals and code-grounded reasons, open questions,
and the phased PR plan (PR-A..PR-F, all landed on this branch).

Reviewed by Livingston for factual accuracy (APPROVE-WITH-NITS): all
commit SHAs, cited test names, and load-bearing code-facts verified
against the tree; citation nits fixed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tions

Expand the v2.1 discussion RFC so a single team review covers:
- NEW v2.0 base schema (Pipeline-as-Config) as a discussion item: version:2
  schema, structural/block-presence routing (CreateModel->CreatePipeline->
  ClassifyStructuralRoute) replacing model_type dispatch, the Wire{from,to}
  dataflow concept, v1->v2 migration (TranslateV1ToPipeline / example 05),
  and six v2.0-level key decisions framed for discussion.
- NEW audio-to-audio (speech-to-speech) forward-compat section: v2.1 schema
  is an additive superset; audio-out wiring is expressible without a v2.0
  break, with executor-level constraints (single int32 token stream + single
  vocab_size, text-only sink) framed as open discussion items.

Sections renumbered; intra-RFC cross-references fixed. Reviewed by Livingston
(APPROVE-WITH-NITS, all code-fact citations verified accurate); nits fixed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@titaiwangms

Copy link
Copy Markdown
Contributor

Review from a model-builder / CP6 perspective

I went through this with a focus on the producer↔consumer contract (this PR is the C++ config consumer; the Python model builder is the producer that would emit these configs). I traced schema → parser → router → executor against the diff. Net takeaway: today the v2 pipeline block is effectively a routing/metadata overlay — the real execution source of truth is still the legacy model.* structures. That's consistent with the honest config.h note ("default consumers continue to read config.model.*"), but the RFC/examples read as if pure-v2 is runnable, which I don't think it is yet. Details below, grounded in the diff.

Critical

  1. v2 pipeline is not lowered into executable stages. LowerPipelineToModel only backfills session filenames + derives context_length; it doesn't materialize flow[]/dataflow[] into model.decoder.pipeline[]/output_names_forwarder, nor lower state.*. PipelineFlow sizes stage_phase_ to model.decoder.pipeline.size() and merely refines the phase of pre-existing legacy stages. So a pure-v2 multi-stage config (e.g. examples 03-vlm-per-image, 06-multimodal-single-pass) routes correctly but has zero stages to execute. A producer would still have to emit the full legacy decoder.pipeline[] + per-session inputs/outputs.

  2. KV-cache / position inputs still read model.decoder.inputs.*, not pipeline.state. kv_cache.cpp / position_inputs.cpp were not switched to read pipeline.state.kv_cache / position_ids, and LowerPipelineToModel doesn't lower those names. A pure-v2 config that omits model.decoder.inputs would hit empty KV/position names. Relatedly, Whisper cross-attention names (cross_past_key_names, …) have no home in Config::Pipeline::State::CrossCache, so encoder-decoder cross-cache can't be expressed in v2.

  3. pipeline.strategy / roles appear orphaned from the load path. ClassifyStructuralRoute and the Generator constructor never read pipeline.strategy; SpeculativeDecoder is constructed only in speculative_decoding_tests.cpp and exposed via no public API. So a config carrying a valid strategy block, loaded through the normal OgaModel/Generator path, would silently fall back to ordinary decoding (no draft, no speedup, no error). roles / verify.session / draft.session are parsed but never consumed or validated. (Design §3.3 describes a LowerPipelineToModel strategy branch that isn't implemented — that seems to be the root cause.)

Major

  1. init/step gating still uses run_on_prompt / run_on_token_gen, not flow.when. flow.when only drives the final phase; for init/step the executor reads the legacy flags. A producer must keep both in sync.
  2. dataflow drops the source session. explicit_wires_ is keyed by (to_session, to_tensor) and stores only from_tensor, then resolves against ortvalue_store_ by bare tensor name. The schema endpoint format is session.tensor, but semantics collapse to tensor — so intermediate tensor names must be globally unique across sessions, which isn't documented.
  3. flow/dataflow endpoints aren't validated. Dangling or misspelled from/to/run references are silently ignored rather than rejected at load time.
  4. Structural routing fails open. ClassifyStructuralRoute returns DecoderOnly as a catch-all for residual configs, whereas legacy dispatch returned Unsupported for unknown model.type — an unknown/misspelled decoder config now loads silently as a plain decoder.
  5. variable_resolution and pure-v2 session_options are parsed but not consumed. CreateVisionState needs flow[].variable_resolution to pick Pixtral vs Qwen, but a pure-v2 Pixtral can't express it; v2 pipeline.sessions[].session_options (providers/provider_options) aren't lowered, so they're effectively ignored.
  6. The equivalence gate is fixture-coverage, not a soundness proof. PipelineDispatchTests asserts structural == legacy over a hardcoded in-tree fixture list; it can't cover structurally-identical-but-different models outside that set (the central "detect, don't declare" bet).

Docs / clarity (would mislead a producer)

  • The grammar logit-op example documents grammar/stateful fields that the parser discards (no struct fields) — copying it yields silently unconstrained output.
  • 07-prefill-decode is a v1 config presented as the prefill/decode how-to; there's no v2 equivalent for that topology.
  • No required-vs-optional field reference table (e.g. tokens.eos, generation.max_length are effectively required but undocumented as such).
  • pipeline.plugin (model construction) vs pipeline.controller (decode loop) are easy to confuse and only plugin appears in an example.
  • sessions serializes as a JSON object keyed by name, but config.h declares std::vector<Session> with no hint — a producer building from the header would emit an array and fail to parse.

Nice work

  • ClassifyLegacyRoute vs ClassifyStructuralRoute + the equivalence test is a solid regression-safety pattern.
  • PipelineFlow's DFS cycle detection + 10-stage guard are good load-time guardrails.
  • The greedy speculative loop in speculative_decoder.cpp looks token-for-token correct under greedy (accept-longest-prefix + always-commit-≥1 + rollback re-seat), and unsupported producers/acceptance/tree all throw rather than silently misbehaving.

Questions

  • Is the Python builder expected to emit native v2, or keep emitting v1 and rely on TranslateV1ToPipeline as the canonical path?
  • Are flow/dataflow/state meant to be lowered into executable stages (and KV/position consumers switched to read pipeline.state), or is v2 a routing/metadata overlay for now?
  • Is speculative-via-config in scope for this PR, or explicitly prototype-only?
  • For the untested init/step/final + dataflow[] path: would you prefer a synthetic N-session toy-ONNX fixture or a VLM-shaped one, and does structural detection accept initializer-only dummy graphs? Happy to contribute that fixture on a branch stacked on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants