Skip to content

refactor: link video jobs to logs by internal id#2711

Open
steebchen wants to merge 4 commits into
mainfrom
fix-video-internal-id-queries
Open

refactor: link video jobs to logs by internal id#2711
steebchen wants to merge 4 commits into
mainfrom
fix-video-internal-id-queries

Conversation

@steebchen

@steebchen steebchen commented Jun 16, 2026

Copy link
Copy Markdown
Member

Summary

Addresses the CodeRabbit concern on apps/api/src/routes/video.ts: the video status route resolved its log row via log.findFirst({ where: { requestId } }), an unscoped match on requestId that could return the wrong record. The video job flow did this in several places (getVideoLogIdByRequestId(job.requestId) in the gateway and worker, plus a projectId + requestId reverse lookup), each re-querying the log table to recover an id it could have stored.

This switches the whole video job flow to a persisted internal id:

  • Schema: add video_job.logId (FK → log.id, ON DELETE SET NULL) + index.
  • Worker (finalizeVideoJob): set logId on the job in the same atomic update that stamps resultLoggedAt, so the link is written exactly when the log is created.
  • API status route: read job.logId directly — no extra query, and already access-checked by organization.
  • Gateway: content routes use job.logId; the log→job reverse lookup matches on video_job.logId instead of projectId + requestId.
  • Removed the now-unused getVideoLogIdByRequestId helper from both the gateway and worker.

The migration backfills logId for existing rows from the current requestId linkage (the only link that exists pre-migration).

Testing

  • pnpm format + pnpm build (17/17) pass.
  • apps/gateway/src/videos/videos.spec.ts: the 15 failing cases fail identically on the base commit (they need live provider credentials); this change is test-neutral.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Performance
    • Improved video content loading speed by eliminating redundant database lookups
    • Streamlined video-to-content URL resolution for faster and more reliable access to video files

The video job flow resolved its log row by matching on requestId
(log.findFirst({ requestId })), which can return the wrong record if
request IDs ever collide and forces a separate query everywhere a job
needs its log.

Persist the log's internal id on video_job.logId when the job is
finalized (set atomically with resultLoggedAt), and use it for every
internal job<->log lookup: the API status route now reads job.logId
directly (no query), the gateway content routes use job.logId, and the
log->job reverse lookup matches on video_job.logId. Backfills existing
rows from the current requestId linkage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1b48d553-3d1f-40cf-b312-7662ae1e54dd

📥 Commits

Reviewing files that changed from the base of the PR and between 8881430 and bf59c77.

📒 Files selected for processing (5)
  • apps/gateway/src/videos/videos.ts
  • packages/db/migrations/1781647635_striped_sunspot.sql
  • packages/db/migrations/meta/1781647635_snapshot.json
  • packages/db/migrations/meta/_journal.json
  • packages/db/src/schema.ts
💤 Files with no reviewable changes (1)
  • packages/db/migrations/1781647635_striped_sunspot.sql
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/db/src/schema.ts
  • apps/gateway/src/videos/videos.ts

Walkthrough

Adds a log_id FK column to video_job (with backfill migration, index, and ON DELETE SET NULL constraint). finalizeVideoJob now persists this logId atomically. All content-serving paths in the worker, gateway, and API route replace secondary requestId→log DB lookups with direct reads of job.logId.

Changes

video_job logId column and lookup removal

Layer / File(s) Summary
DB schema: logId column, index, migration, and journal
packages/db/src/schema.ts, packages/db/migrations/1781647635_striped_sunspot.sql, packages/db/migrations/meta/_journal.json
videoJob gains a nullable logId FK referencing log.id with onDelete: "set null" and a video_job_log_id_idx index. The SQL migration adds the column, backfills it by joining on request_id/project_id/organization_id, creates the index, and adds the FK constraint. The journal registers entry idx: 164.
Worker: write logId at finalization, remove requestId lookup
apps/worker/src/services/video-jobs.ts
finalizeVideoJob atomically sets logId on the videoJob row alongside resultLoggedAt. getPublicVideoContentUrl resolves logId as logId ?? job.logId, and the getVideoLogIdByRequestId helper querying tables.log by requestId is deleted.
Gateway: replace requestId-based logId lookups with job.logId
apps/gateway/src/videos/videos.ts
Removes the getVideoLogIdByRequestId helper. getPublicVideoContentUrl uses job.logId directly. The /logs/{log_id}/content handler matches videoJob by videoJob.logId == log.id. Both the direct-proxy and external-content branches of /videos/{video_id}/content read logId = job.logId for download marking.
API route: use job.logId for completed video content URL
apps/api/src/routes/video.ts
GET /{videoId} gates content array construction on job.logId being present and uses it to generate the signed URL directly, removing the secondary tables.log query by job.requestId.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: refactoring to link video jobs to logs using internal database ids instead of requestId matching.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-video-internal-id-queries

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0371d77562

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread apps/worker/src/services/video-jobs.ts Outdated
.update(tables.videoJob)
.set({
resultLoggedAt: now,
logId,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Insert the log before assigning its FK

When a terminal video job is finalized, this update writes a freshly generated logId before tx.insert(tables.log) creates that row. The migration adds a non-deferrable FK from video_job.log_id to log.id, so Postgres checks this statement immediately and rejects every first-time finalization with a foreign-key violation; the transaction never reaches the log insert, so completed/failed video jobs are not logged and the rest of finalization cannot run. Insert the log first or assign logId in a second update after the insert.

Useful? React with 👍 / 👎.

@@ -0,0 +1,4 @@
ALTER TABLE "video_job" ADD COLUMN "log_id" text;--> statement-breakpoint
UPDATE "video_job" SET "log_id" = "log"."id" FROM "log" WHERE "log"."request_id" = "video_job"."request_id";--> statement-breakpoint

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Scope the log backfill to the job project

This backfill matches only on request_id, but video request IDs are accepted from the caller's x-request-id header and are only indexed, not unique, so the same value can exist in multiple projects or orgs. In that case this UPDATE can attach a video_job to an unrelated log; after this change the API and gateway trust job.logId for signed content URLs and log-to-job lookup, so migrated historical jobs can serve or fetch the wrong log's video content. Include at least project_id (and ideally organization_id) in the join.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/db/migrations/1781645725_pink_morlocks.sql (1)

2-2: 💤 Low value

Consider scoping the backfill by project_id for accuracy.

The current backfill joins only on request_id, which could link a job to the wrong log if request IDs happen to collide across different projects. Both tables have project_id—adding it to the join would make the backfill more precise.

Suggested improvement
-UPDATE "video_job" SET "log_id" = "log"."id" FROM "log" WHERE "log"."request_id" = "video_job"."request_id";--> statement-breakpoint
+UPDATE "video_job" SET "log_id" = "log"."id" FROM "log" WHERE "log"."request_id" = "video_job"."request_id" AND "log"."project_id" = "video_job"."project_id";--> statement-breakpoint
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/db/migrations/1781645725_pink_morlocks.sql` at line 2, The UPDATE
statement in the migration joins the video_job and log tables only on
request_id, which could incorrectly match rows if request IDs collide across
different projects. Modify the WHERE clause in the UPDATE statement to add an
additional AND condition that also matches on project_id from both tables,
ensuring the backfill accurately scopes the join to both request_id and
project_id.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/db/migrations/1781645725_pink_morlocks.sql`:
- Line 2: The UPDATE statement in the migration joins the video_job and log
tables only on request_id, which could incorrectly match rows if request IDs
collide across different projects. Modify the WHERE clause in the UPDATE
statement to add an additional AND condition that also matches on project_id
from both tables, ensuring the backfill accurately scopes the join to both
request_id and project_id.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ce017fb7-1381-41af-b238-d5daf31b7766

📥 Commits

Reviewing files that changed from the base of the PR and between 6f8c753 and 0371d77.

📒 Files selected for processing (7)
  • apps/api/src/routes/video.ts
  • apps/gateway/src/videos/videos.ts
  • apps/worker/src/services/video-jobs.ts
  • packages/db/migrations/1781645725_pink_morlocks.sql
  • packages/db/migrations/meta/1781645725_snapshot.json
  • packages/db/migrations/meta/_journal.json
  • packages/db/src/schema.ts

steebchen and others added 3 commits June 16, 2026 23:01
Address PR review:

- The FK video_job.log_id -> log.id is non-deferrable, so writing logId
  in the claim update before the log row exists raised an immediate
  foreign-key violation on first finalization. Insert the log first,
  then set logId in a second update within the same transaction (the
  claim update still guards concurrency via resultLoggedAt).
- Scope the historical backfill by project_id and organization_id in
  addition to request_id, since request IDs are caller-supplied and
  only indexed, not unique across projects/orgs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reset migrations to origin/main per repo procedure; regenerated below.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bf59c77772

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@@ -0,0 +1,4 @@
ALTER TABLE "video_job" ADD COLUMN "log_id" text;--> statement-breakpoint
UPDATE "video_job" SET "log_id" = "log"."id" FROM "log" WHERE "log"."request_id" = "video_job"."request_id" AND "log"."project_id" = "video_job"."project_id" AND "log"."organization_id" = "video_job"."organization_id";--> statement-breakpoint

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make the backfill ignore ambiguous request IDs

Fresh evidence: this revised backfill now scopes by project and organization, but it still updates from every log row sharing the client-supplied request_id. Since x-request-id is accepted from callers and log.request_id is only indexed, a single project can have multiple matching logs for the same value (for example a retry, or a chat request and a video request reusing the header); PostgreSQL UPDATE ... FROM can then choose an arbitrary log.id. After this change the gateway resolves video content solely through video_job.log_id, so an ambiguous historical backfill can attach the job to the wrong log and make the real video log URL 404 or mark downloads on an unrelated log.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant