Skip to content

SW-1246: cursor pagination on v2 data endpoint#690

Merged
wheelsandcogs merged 3 commits into
mainfrom
SW-1246
Jun 3, 2026
Merged

SW-1246: cursor pagination on v2 data endpoint#690
wheelsandcogs merged 3 commits into
mainfrom
SW-1246

Conversation

@wheelsandcogs

@wheelsandcogs wheelsandcogs commented May 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds keyset (cursor) pagination to the v2 GET /:dataset_id/data endpoint as a faster, more attack-resistant alternative to deep LIMIT/OFFSET reads. v1 is out of scope (legacy, deprecate). Pivot endpoint is out of scope (separate ticket).

SW-1246 — follow-up to the 2026-04-23 incident, where deep-paginated requests with rotating page_number bypassed Front Door cache and saturated the application DB pool.

What changed

  • New cursor query param on /v2/:dataset_id/data (and the filtered variant). Opaque base64url token, serialised as a compact positional array [version, contextHash, direction, keyTuple]. Bound to queryStoreId + revisionId + language + sortHash via a single context hash that is recomputed and revalidated on every decode. Any mismatch → 400 errors.invalid_cursor.
  • Default ordering when the caller passes no sort_by: first Time column on the fact table, falling back to first Dimension, with the composite PK appended as tie-breakers. Applied to both cursor and offset paths so the offset→cursor handover at the page-100 boundary is seamless.
  • Multi-column sort_by supported on the cursor path. OR-ladder WHERE construction handles mixed ASC/DESC and NULLs explicitly (IS NOT DISTINCT FROM for equality rungs; explicit NULL placement rungs for inequality; Postgres NULLS LAST on ASC / NULLS FIRST on DESC).
  • Response envelope gains next_cursor / prev_cursor (nullable). In cursor mode current_page / start_record / end_record are null — they're not meaningful when paginating by keyset. total_records / total_pages stay (cached on QueryStore).
  • Swagger regenerated with the new cursor parameter and updated PageInfoV2.

What did not change

  • page_number is still accepted as before (no cap enforced server-side in this PR — the frontend caps at 100 today; backend cap will land separately).
  • Offset behaviour for callers passing an explicit sort_by is unchanged.
  • v1 endpoint is untouched.

Notable design choices

  • Cursors are not signed. Integrity comes from binding to server-side facts (queryStoreId, revisionId, lang, sortHash) and revalidating on every decode. Tampering produces a 400, not silent corruption.
  • Compact token. The cursor carries only a version, a 22-char context hash, the direction and the key tuple — positional, not an object with named keys. The raw queryStoreId / revisionId are folded into the hash rather than transmitted, so a typical cursor is ~70 chars instead of ~220. The binding is unchanged in effect: a cursor replayed against different filters / revision / language / sort recomputes a different hash and 400s rather than silently mis-paginating.
  • sort_by change, language switch, or revision change all invalidate the cursor. Frontend strips cursor on language/sort changes (see paired frontend PR) so users land on page 1 of the new traversal rather than hitting a 400.
  • No new indexes. The "deep cursor reads ≈ shallow" perf guarantee only holds when the sort is index-served — composite PK is, arbitrary sort_by may not be. Treated as aspirational; per-cube sort indexes are a follow-up.

Test plan

  • npm run check is green
  • Forward cursor traversal across a small dataset matches offset-paged concatenation
  • Backward cursor traversal returns rows in the correct order
  • next_cursor is null on the last page
  • Cross-queryStore / cross-revision / cross-language / cross-sort cursors → 400
  • cursor + page_number > 1 → 400 (mutually exclusive)
  • Dataset with a time column orders by it; dataset without orders by first dim
  • No duplicates or gaps when sort-key values tie
  • Swagger UI shows the new cursor parameter on both v2 data routes
  • Manual perf smoke against a 3M+ row cube: compare page_number=1 vs page_number=100 (offset) vs cursor=<deep> (cursor) — record in PR before merge

Copilot AI review requested due to automatic review settings May 29, 2026 13:24

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds keyset (cursor) pagination support to the v2 consumer data endpoint (GET /v2/:dataset_id/data, including filtered variant) to avoid deep LIMIT/OFFSET reads and enable forward/backward navigation via opaque cursors.

Changes:

  • Introduces cursor query param parsing/validation and extends PageOptions to carry it.
  • Implements cursor encoding/decoding, deterministic default sort resolution, and keyset WHERE-clause construction; wires these into consumer-view-v2 query building and frontend response envelopes (next_cursor / prev_cursor).
  • Regenerates/updates Swagger/OpenAPI docs and adds unit/integration test coverage for cursor mode.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/unit/utils/parse-page-options.test.ts Adds unit tests for cursor parsing + mutual exclusivity with page_number > 1.
test/unit/utils/cursor-codec.test.ts Adds unit tests for cursor codec round-trips and invalid cursor rejection cases.
test/unit/services/keyset-where-builder.test.ts Adds unit tests for keyset predicate generation (multi-column, NULL semantics, escaping).
test/unit/services/default-sort-resolver.test.ts Adds unit tests ensuring default sort selection and PK tie-breakers are deterministic and language-aware.
test/unit/services/consumer-view-v2.test.ts Updates tests for new BuildDataQueryResult shape and adds cursor-mode negative cases.
test/integration/routes/consumer-v2/data-and-pivot.test.ts Adds integration coverage for cursor traversal forward/backward and invalid cursor scenarios.
src/validators/index.ts Adds cursorValidator enforcing type and max length.
src/utils/parse-page-options.ts Plumbs cursor into parsed page options and enforces cursor/page_number exclusivity.
src/utils/cursor-codec.ts New: base64url JSON cursor codec + sort-hash binding and validation.
src/services/keyset-where-builder.ts New: OR-ladder keyset WHERE clause builder with explicit NULL handling.
src/services/default-sort-resolver.ts New: resolves default deterministic sort and PK tie-breakers from dataset fact table + column mapping.
src/services/consumer-view-v2.ts Core implementation: cursor/offset query building, _sort projection injection, and response page_info cursor fields.
src/routes/consumer/v2/schema.ts Adds cursor query parameter to Swagger schema definitions.
src/routes/consumer/v2/api.ts Wires cursor param into swagger refs and expands endpoint descriptions for pagination modes.
src/routes/consumer/v2/openapi-en.json Regenerated OpenAPI (EN) including cursor param and updated descriptions.
src/routes/consumer/v2/openapi-cy.json Regenerated OpenAPI (CY) including cursor param.
src/interfaces/page-options.ts Adds optional cursor to PageOptions.
src/controllers/dataset.ts Updates preview controller to pass dataset (with factTable) into buildDataQuery and forward BuildDataQueryResult.
src/controllers/consumer-v2.ts Updates consumer controller to ensure dataset is loaded with factTable for deterministic sorting and forwards BuildDataQueryResult.

Comment thread src/services/consumer-view-v2.ts
Comment thread test/integration/routes/consumer-v2/data-and-pivot.test.ts
@wheelsandcogs

Copy link
Copy Markdown
Collaborator Author

Pushed since the last review:

  • Rebased onto main (clean, no conflicts).
  • Addressed both review comments — the backward-traversal cursor logic and the leftover console.log were already fixed in fix review comments; replies are on the threads.
  • Shortened the cursor token (reduce cursor string size): the four binding fields (queryStoreId, revisionId, language, sortHash) now collapse into a single 22-char context hash, and the payload serialises as a compact positional array [v, c, d, k] instead of an object. A typical cursor drops from ~223 to ~71 chars. The raw ids are no longer transmitted — only folded into the hash — but the binding is unchanged in effect (cross-query / revision / language / sort cursors still 400). CURSOR_VERSION stays 1 since the format never shipped.

npm run check green (build, lint, 1711 tests).

@j-maynard j-maynard left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🤘🏻

@wheelsandcogs wheelsandcogs merged commit f5b56d7 into main Jun 3, 2026
6 checks passed
@wheelsandcogs wheelsandcogs deleted the SW-1246 branch June 3, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants