Skip to content

fix(codex): fully sync provider switch state for Codex app sessions#346

Draft
luuuuyang wants to merge 1 commit into
UfoMiao:mainfrom
luuuuyang:codex/fix-codex-provider-session-sync
Draft

fix(codex): fully sync provider switch state for Codex app sessions#346
luuuuyang wants to merge 1 commit into
UfoMiao:mainfrom
luuuuyang:codex/fix-codex-provider-session-sync

Conversation

@luuuuyang

Copy link
Copy Markdown

Summary

Fixes #345.

This keeps Codex provider switching aligned across config, auth, and session visibility:

  • route direct config-switch Codex provider changes through switchToProvider() instead of the partial switchCodexProvider() path
  • use the same full switch flow when Codex multi-config sets a default provider
  • sync session_meta.payload.model_provider inside rollout JSONL files under ~/.codex/sessions and ~/.codex/archived_sessions

Why

switchCodexProvider() only updates model_provider, so direct switches could leave auth.json pointing at the previous provider key. Separately, Codex Desktop/App session visibility can drift when existing rollout metadata still points at the old provider.

This implementation stays compatible with Node 18 by syncing JSONL session metadata directly and avoiding any SQLite dependency.

Testing

  • corepack pnpm vitest "tests/unit/commands/config-switch.test.ts" "tests/unit/commands/config-switch-claude-code.test.ts" "tests/unit/commands/handleMultiConfigurations.test.ts" "tests/unit/commands/init-multi-config.test.ts" "tests/unit/utils/code-tools"
  • corepack pnpm typecheck

@WitMiao

WitMiao commented May 2, 2026

Copy link
Copy Markdown
Collaborator

Code Review by Claude

Thanks for the careful fix, @luuuuyang! 🙏 Routing direct switches through switchToProvider() and syncing the rollout JSONL metadata is the right shape for #345, and the Node-18-compat path (no SQLite) is a nice touch.

That said, I want to flag two P1 side effects before this lands, plus a few smaller cleanups.


🔴 P1 #1 — Every provider switch full-copies ~/.codex/, including all session history

switchToProvider() calls backupCodexComplete()backupCodexFiles(), which only filters out backup/ and tmp/:

// src/utils/code-tools/codex.ts:214-220
const filter = (path: string): boolean => {
  return !path.includes('/backup') && !path.startsWith(tmpDir)
}
copyDir(CODEX_DIR, backupDir, { filter })

On a typical user machine I measured: ~/.codex/sessions is 57 MB / 231 JSONL files, and archived_sessions grows over time. This PR routes direct switches (config-switch <provider> -T codex) and multi-config init "set default provider" through this path too, so the call frequency goes up materially. A user who switches providers a handful of times per day can easily accumulate hundreds of MB to multiple GB of redundant backups.

This isn't strictly a bug introduced by this PR, but the PR amplifies it. I'd recommend fixing it in the same PR:

const filter = (path: string): boolean => {
  return !path.includes('/backup')
    && !path.startsWith(tmpDir)
    && !path.includes('/sessions')
    && !path.includes('/archived_sessions')
}

Or alternatively: do the full backup only on first init, and on switches back off to backing up just config.toml + auth.json.


🔴 P1 #2 — Silently overwrites historical session ownership; original model_provider is lost forever

// src/utils/code-tools/codex.ts:101-106
record.payload = {
  ...record.payload,
  model_provider: providerId,
}

syncCodexSessionProviders() rewrites session_meta.payload.model_provider to the new provider for every session file. The implication:

  • A session originally created under provider A is rewritten to B after a switch
  • Switch back to A → another full rewrite. The original "A" attribution is permanently lost.
  • This is essentially a workaround for Codex Desktop's UI filter logic — which is exactly what the issue reporter proposed — but the side effect is that the "session ↔ creating provider" association is destroyed.

The PR description doesn't mention this. I'd suggest one (or a combination) of:

  1. Inform the user: print Will sync N session(s) to "<providerId>" before rewriting, so the change is visible.
  2. Preserve history: on first sync, write the original value to a new original_model_provider field (idempotent on subsequent runs).
  3. Make it opt-in: only rewrite when --sync-sessions is passed; default to syncing config + auth only.

Personally I'd go with 1 + 2: visible side effect, and recoverable.


🟡 P2 #1 — All errors are swallowed; user sees "success" while files may have silently failed to update

// src/utils/code-tools/codex.ts:135-145
for (const sessionFile of sessionFiles) {
  try {
    const updatedContent = rewriteSessionMetaProvider(readFile(sessionFile), providerId)
    if (updatedContent !== null) {
      writeFile(sessionFile, updatedContent)
    }
  }
  catch {
    // Ignore malformed or transient session files and keep the provider switch successful.
  }
}

Read-only permissions, full disk, SIGINT mid-loop — all swallowed. Then we print providerSwitchSuccess, which is exactly the "sessions still missing" symptom the issue is trying to fix. At minimum, count failures and warn:

✔ Provider switched to test-provider
⚠ Synced sessions: 198 updated, 33 failed

🟡 P2 #2switchCodexProvider() is now half-dead but still exported and still tested for its old behavior

After this PR, every internal caller has moved:

Location Before After
src/commands/config-switch.ts:161 switchCodexProvider switchToProvider
src/commands/init.ts:1287 switchCodexProvider switchToProvider

But the function is still exported, and tests/unit/utils/code-tools/codex-provider-switch.test.ts still pins its old behavior ("only updates model_provider, doesn't touch auth.json"). Two risks:

  1. If zcf is consumed as a library, downstream callers still using switchCodexProvider hit the exact half-sync bug from bug(codex): switching provider can hide existing Codex app sessions and direct switch does not fully sync auth #345.
  2. Old tests passing ≠ old behavior is safe — they actively hide the risk.

Suggest one of: add @deprecated JSDoc + a runtime console.warn, or just remove it and migrate the tests onto switchToProvider.


🟢 P3 — Test coverage gaps

The new session-sync test in tests/unit/utils/code-tools/codex.test.ts covers single file + happy path only. Missing:

  • archived_sessions with actual files (current mock returns [])
  • A single file with mixed records: session_meta + response_item
  • A session whose model_provider already matches → should skip and not call writeFile (idempotency)
  • Malformed JSON line → preserved unchanged, no throw
  • CRLF line endings (the code handles them, but no regression guard)
  • writeFile throws → switchToProvider should still return true

tests/unit/commands/config-switch.test.ts:277 uses mockRejectedValue, but the real switchToProvider has internal try/catch and always returns a boolean — it never throws. Suggest adding a mockResolvedValue(false) case to verify what configSwitchCommand shows the user when a switch fails.


🟢 P3 — Smaller items

  • JSON.stringify(record) loses the original line's formatting (key order, whitespace). Semantically equivalent, but if any downstream tool does line-level diff or checksums it'll flag the changed lines as "not written by Codex". Worth a one-line comment in the function.
  • All session I/O is sync (readDir/readFile/writeFile/isFile). Fine for ~hundreds of files, but a user with thousands of sessions in a deep year/month/day tree will see a noticeable pause. Consider printing Syncing N session files... when N > 100.
  • CODEX_SESSION_DIRECTORIES = ['sessions', 'archived_sessions'] is hardcoded — if Codex ever adds a new session location it'll be silently missed. A short comment noting where to update it would help future maintainers.

Summary

Priority Issue Action
🔴 P1 Backup balloons disk Filter out sessions/archived_sessions
🔴 P1 Silent rewrite of session ownership Notify user + preserve original_model_provider
🟡 P2 Errors swallowed Track failure count and warn
🟡 P2 switchCodexProvider half-dead @deprecated or remove
🟢 P3 Test gaps Add ~5 edge cases
🟢 P3 Sync I/O Print progress on large session sets

The core fix is sound ✅. I'd just recommend handling the two P1 side effects before flipping out of draft, otherwise we'd resolve #345 while introducing two new ones (disk bloat + session-attribution loss).

Thanks again for digging into this — the JSONL-without-SQLite approach is exactly the right call for the Node 18 compat goal. 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(codex): switching provider can hide existing Codex app sessions and direct switch does not fully sync auth

2 participants