Skip to content

πŸ€– Cross-Engine Agent Team β€” Orchestrate Multi-Engine Collaborative WorkflowsΒ #90

@realDuang

Description

@realDuang

πŸ’‘ Vision

CodeMux already unifies multiple AI coding engines (OpenCode, Copilot, Claude Code) under a single gateway. The next leap is enabling these engines to work together as a team β€” automatically decomposing complex tasks and distributing subtasks across engines/sessions in parallel, each working in an isolated environment, then aggregating results back to the user.

This is not about wrapping a single engine's sub-agent system (like Claude Code's internal Agent tool). Instead, it's about building a cross-engine orchestration layer that is unique to CodeMux's multi-engine architecture β€” something no single-engine tool can achieve.

🎯 Core Concept

User sends a complex task (e.g., "Refactor auth module and add tests")
      ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            Orchestrator (new layer)                   β”‚
β”‚  1. Analyze & decompose task                         β”‚
β”‚  2. Assign subtasks to engines/sessions              β”‚
β”‚  3. Monitor progress                                 β”‚
β”‚  4. Collect & synthesize results                     β”‚
β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”‚              β”‚              β”‚
   β–Ό              β–Ό              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Engine β”‚   β”‚ Engine β”‚   β”‚ Engine β”‚   ← Same or different engines
β”‚ Claude β”‚   β”‚Copilot β”‚   β”‚OpenCodeβ”‚
β”‚ wt-1   β”‚   β”‚ wt-2   β”‚   β”‚ wt-3   β”‚   ← Isolated worktrees
β”‚"search β”‚   β”‚"write  β”‚   β”‚"run    β”‚
β”‚ & plan"β”‚   β”‚ tests" β”‚   β”‚ build" β”‚
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”˜
     β”‚            β”‚            β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β–Ό
       Orchestrator merges results
                  ↓
         Unified response to user

🧭 Claude Agent Team vs CodeMux Orchestration

Claude Code already has built-in Agent Team capabilities (AgentInput.team_name, agents, TeammateIdle hook) for sub-agent parallelism within a single session. This is fundamentally different from what CodeMux orchestration provides:

Claude Agent Team CodeMux Orchestration
Orchestration layer Inside Claude session CodeMux application layer
Engine scope Claude only Cross-engine (Claude + Copilot + OpenCode)
Context Sub-agents share parent session context Sessions are fully independent, Orchestrator injects context
Isolation Optional worktree Each subtask gets an independent worktree
User control Claude decides internally Users review and edit the decomposition plan in UI

Conclusion: Intra-engine parallelism is deferred to Agent Team's own mechanism. CodeMux focuses on cross-engine orchestration and user-controlled task decomposition. Even with a single engine, CodeMux provides true file-level worktree isolation that Agent Team cannot achieve.

✨ Roadmap

Phase 0: Sub-agent Visibility βœ… (PR #99, merged)

Capture previously-dropped SDK messages (task_started, task_progress, task_notification, tool_progress) in ClaudeCodeAdapter and surface real-time sub-agent activity in the UI:

  • RunningToolCard shows current subtool name and tool-use count during execution
  • TaskTool completed state displays AI-generated summary and tool-use stats
  • Status bar appends active subtool name (e.g., "Delegating work Β· Fix the bug (Bash)")

Phase 1: Cross-Engine Task Orchestration 🚧

Core Architecture: Hub-and-Spoke

Sessions do not communicate directly. All information flows through the Orchestrator (an LLM session):

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚       Orchestrator           β”‚
                    β”‚       (LLM session)          β”‚
                    β”‚                              β”‚
                    β”‚  β€’ Holds global context       β”‚
                    β”‚  β€’ Manages DAG dependency     β”‚
                    β”‚  β€’ Injects upstream results   β”‚
                    β”‚  β€’ Decides next round actions β”‚
                    β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚      β”‚      β”‚
           results up↑ β”‚  ↓dispatch  β”‚  ↑results  │↓dispatch + inject context
                       β”‚      β”‚      β”‚
                 β”Œβ”€β”€β”€β”€β”€β”΄β”  β”Œβ”€β”€β”΄β”€β”€β”€β” β”Œβ”΄β”€β”€β”€β”€β”€β”€β”
                 β”‚ S1   β”‚  β”‚ S2   β”‚ β”‚ S3    β”‚
                 β”‚Claudeβ”‚  β”‚Copilotβ”‚ β”‚Claude β”‚
                 β”‚wt-1  β”‚  β”‚wt-2  β”‚ β”‚wt-3   β”‚
                 β””β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”˜

Execution Model: DAG + Multi-Round Iteration

Rather than simple "parallel dispatch β†’ wait all β†’ aggregate", execution follows a dependency graph with iterative rounds:

Example: "Refactor auth module, add tests, then verify everything passes"

LLM decomposes into a DAG:

  [Analyze arch] ──→ [Write tests] ──→ [Verify integration]
  (Claude)           (Copilot)         (Claude)
       └──────────→ [Refactor impl] ─↗
                    (Claude)

  Round 1            Round 2            Round 3
  (no dependencies)  (parallel, deps    (waits for
                      satisfied)         all above)

Execution loop:

while (unfinished subtasks exist) {
  1. Find all subtasks whose dependencies are satisfied but not yet started
  2. Collect upstream result summaries, inject into downstream prompts
  3. Dispatch these subtasks in parallel (create worktree + session + send message)
  4. Wait for any running subtask to complete
  5. Extract completed subtask's result summary
  6. Go back to step 1
}
Aggregate final results

Context injection (the core mechanism for inter-session communication):

When a downstream subtask starts, the Orchestrator injects upstream results into its prompt:

Message sent to the "Write tests" session:

Your task: Write unit tests for the auth module

## Upstream Task Results

### [Analyze architecture] output:
- Core modules: jwt/, session/, oauth/
- Entry point: src/auth/index.ts
- Current test coverage: 12%
- Key finding: token refresh logic has zero test coverage

Based on the analysis above, write comprehensive unit tests.

UX Design: Explicit UI, No Slash Commands

Entry point: A new "Team Tasks" collapsible section in the sidebar (between Active Sessions and Scheduled Tasks):

β”Œβ”€ Sidebar ─────────────────────┐
β”‚ πŸ”΅ Active Sessions (2)       β”‚
β”‚   β”œ Session A     ● Running   β”‚
β”‚   β”” Session B     βœ“ Done      β”‚
β”‚                               β”‚
β”‚ πŸ‘₯ Team Tasks (1)       [+]  β”‚  ← NEW section
β”‚   β”” Refactor auth ● Running  β”‚
β”‚                               β”‚
β”‚ ⏰ Scheduled Tasks (3)       β”‚
β”‚   ...                         β”‚
β”‚                               β”‚
β”‚ πŸ“ Projects                  β”‚
β”‚   ...                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

4-phase view (main content area switches to Orchestration Dashboard):

  1. Setup View β€” Task description textarea + engine cards (multi-select with running status) + project selector + [Analyze Task] button
  2. Task Plan View β€” Editable subtask cards from LLM decomposition (description, engine, worktree toggle, dependencies), add/remove subtasks + [Execute] button
  3. Execution Dashboard β€” Subtask cards with dependency topology (showing Round progress), real-time status updates, each card has [View Session] to navigate to the child session chat view
  4. Result View β€” LLM-aggregated summary + per-subtask results (collapsible) + worktree merge/delete actions

Execution Dashboard example with DAG visualization:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  πŸ‘₯ Refactor auth module        ● Running  Round 2/3β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                     β”‚
β”‚  β”Œβ”€ 1. Analyze arch ─── Claude ──────────────────┐ β”‚
β”‚  β”‚  βœ… Completed Β· 45s                            β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”                      β”‚
β”‚                    β–Ό         β–Ό                      β”‚
β”‚  β”Œβ”€ 2. Write tests ─ Copilot┐ β”Œβ”€ 3. Refactor ── Claude ─┐│
β”‚  β”‚  πŸ”΅ Running Β· Bash       β”‚ β”‚  πŸ”΅ Running Β· Edit      β”‚β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β”‚                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚                          β–Ό                            β”‚
β”‚  β”Œβ”€ 4. Verify integration ─── Claude ────────────┐  β”‚
β”‚  β”‚  β—‹ Blocked Β· waiting for #2, #3               β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                     β”‚
β”‚  [Cancel All]                                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Child session navigation: Child sessions appear naturally in the Active Sessions sidebar section. The child session titlebar shows a breadcrumb "← Back to Team Task" to return to the Dashboard.

Task Decomposition Strategy

  • LLM proposes + user confirms: The Orchestrator sends a structured prompt; the LLM returns a JSON subtask array (with dependsOn dependencies)
  • Engine recommendation: The prompt describes each engine's strengths; the LLM recommends engine assignments; users can freely change these in the confirmation UI
  • Same-engine multi-session: A single engine can run multiple parallel sessions (each in its own worktree) β€” even single-engine setups benefit from orchestration

Data Model

interface OrchestrationRun {
  id: string;
  parentSessionId: string;  // orchestrator session
  directory: string;
  status: "setup" | "decomposing" | "confirming" | "dispatching"
        | "running" | "aggregating" | "completed" | "failed" | "cancelled";
  prompt: string;
  engineTypes: EngineType[];
  subtasks: OrchestrationSubtask[];
  resultSummary?: string;
  createdAt: number;
  completedAt?: number;
}

interface OrchestrationSubtask {
  id: string;
  description: string;
  engineType: EngineType;
  dependsOn: string[];  // subtask IDs
  sessionId?: string;
  worktreeId?: string;
  needsWorktree: boolean;
  status: "blocked" | "pending" | "running" | "completed" | "failed";
  resultSummary?: string;
  error?: string;
  duration?: number;
  toolUses?: number;
}

Implementation Steps

Step File Operation Description
1 src/types/unified.ts Modify Add OrchestrationRun/Subtask types and Gateway message types
2 electron/main/services/orchestrator-service.ts New Core orchestration service: DAG execution, context injection, result aggregation
3 electron/main/gateway/ws-server.ts Modify Route orchestration requests + broadcast orchestration.updated events
4 electron/main/index.ts Modify Initialize OrchestratorService
5 src/lib/gateway-api.ts Modify Frontend API methods + notification handler
6 src/stores/orchestration.ts New Orchestration state store
7 src/components/TeamTaskSection.tsx New Sidebar Team Tasks section
8 src/components/SessionSidebar.tsx Modify Insert TeamTaskSection
9 src/components/OrchestrationDashboard.tsx New Main Dashboard component (4-phase views)
10 src/pages/Chat.tsx Modify View switching + child session breadcrumb navigation

Reused Existing Infrastructure

  • WorktreeManager.create/remove/merge β€” worktree lifecycle management
  • EngineManager.createSession/sendMessage β€” session lifecycle management
  • ScheduledTaskService permission auto-approve pattern β€” unattended subtask execution
  • computeActiveSessions β€” child sessions automatically appear in Active Sessions
  • getEngineBadge() / StatusIndicator β€” engine badges and status icon reuse

Phase 2: Intelligence

  • Engine-Aware Routing β€” Smart assignment of subtasks to the best-suited engine (e.g., Claude for reasoning-heavy tasks, Copilot for code generation)
  • Conflict Resolution β€” Detect and resolve merge conflicts when multiple worktrees modify overlapping files
  • Retry & Fallback β€” If one engine fails a subtask, auto-retry on a different engine
  • Cost Management β€” Per-subtask token/cost tracking with budget controls

Phase 3: Advanced

  • Human-in-the-Loop Checkpoints β€” Pause orchestration at defined points for user review before proceeding
  • Team Presets β€” User-defined orchestration templates (e.g., "Code Review Team" = Claude analyzes + Copilot suggests fixes + OpenCode runs tests)
  • Dynamic Re-planning β€” Orchestrator can modify the DAG mid-execution based on intermediate results

πŸ—οΈ Why CodeMux Is Uniquely Positioned

Existing Infrastructure How It Enables Agent Team
EngineManager + multi-session routing Natural foundation for parallel session dispatch
WorktreeManager Code isolation between parallel agents is already built
Unified Type System Results from any engine already normalized β€” aggregation is straightforward
WebSocket Gateway Real-time progress streaming to all clients (desktop, browser, IM)
IM Channel Adapters Agent Team results accessible from Feishu/DingTalk/Telegram
ScheduledTaskService Permission auto-approve pattern reusable for unattended subtask execution
Permission/Question System Supports human-in-the-loop approval during orchestration

πŸ€” Design Decisions (Resolved)

Question Decision
Orchestrator implementation OrchestratorService as a layer above EngineManager, using LLM for task decomposition
Task decomposition strategy LLM-driven decomposition with user confirmation/editing in UI
Scope of first iteration Jump directly to cross-engine orchestration (same-engine internal parallelism deferred to Agent Team)
Inter-session communication Hub-and-Spoke: Orchestrator extracts upstream results β†’ injects into downstream prompts
Execution model DAG-based with multi-round iteration, not simple parallel-then-aggregate
UX approach Explicit UI (Sidebar section + Dashboard), no slash commands

πŸ“Ž Related


This feature would make CodeMux the first multi-engine AI coding client with cross-engine collaborative agent workflows β€” a capability that no single-engine tool (Claude Code, Copilot, Cursor, etc.) can offer on its own.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions