Skip to content

fix(cohorts): fix cross-DB alias error in static cohort insertion#53804

Open
dmarticus wants to merge 2 commits intomasterfrom
dmarticus/fix-static-cohorts
Open

fix(cohorts): fix cross-DB alias error in static cohort insertion#53804
dmarticus wants to merge 2 commits intomasterfrom
dmarticus/fix-static-cohorts

Conversation

@dmarticus
Copy link
Copy Markdown
Contributor

Problem

#52819 fixed static cohort duplication truncation but was reverted (#53670) because it broke adding users to static cohorts in production.

The root cause: the ClickHouse dedup path mixed db_read (persons_db_reader) and db_write (persons_db_writer) Django database aliases in a single queryset with .exclude(). Django rejects cross-alias subqueries — even when both aliases point to the same physical database — raising:

ValueError: Subqueries aren't allowed across different databases.

This error was silently swallowed by the except block in _insert_users_list_with_batching, so API calls returned 200 but cohort membership was never updated. Tests passed because db_read == db_write in the test config.

Changes

  • CH dedup path: Use db_write for both Person and CohortPeople querysets so Django can merge the .exclude() into a single NOT IN subquery. No cross-DB error, no memory explosion.
  • PG insert path: Replace the string-replacement hack (UPDATE_QUERY.format(sql.replace(...))) with a LEFT JOIN via raw SQL on the db_write cursor. Dedup stays entirely in SQL, avoiding the O(cohort_size) memory cost of loading all existing member IDs into Python.
  • Remove unused UPDATE_QUERY constant.
  • Re-apply admin improvements from fix(cohorts): fix static cohort duplication truncation and add admin re-sync #52819 (resync action, list fields).
  • Add regression test that simulates production's split DB aliases.

How did you test this code?

  • All 40 static cohort API tests pass
  • All 4 feature flag cohort generation tests pass (with updated query counts)
  • All 5 cohort model insertion tests pass
  • Regression test validates the fix: passes with db_write for both querysets, fails with db_read + db_write mix (reproduces the production ValueError)

Publish to changelog?

No

Docs update

N/A

LLM context

Re-land of #52819 with the cross-DB alias bug fixed. The key insight is that Django checks database aliases, not physical databases, when validating subqueries. Production uses separate persons_db_reader/persons_db_writer aliases, but tests use a single alias for both, which is why the original PR's tests passed.

The original fix (#52819) for static cohort duplication truncation was
reverted because it caused a ValueError in production:

  "Subqueries aren't allowed across different databases"

Root cause: the CH dedup path created a queryset mixing db_read
(persons_db_reader) and db_write (persons_db_writer) aliases. Django
rejects cross-alias subqueries even when both point to the same
physical database. The error was silently swallowed by the except
block, so API calls returned 200 but cohort membership never updated.

Fix: use db_write for both Person and CohortPeople in the CH dedup
path so Django can merge the .exclude() into a single NOT IN subquery.
The PG path uses a LEFT JOIN via raw cursor (bypasses Django's check).

Also re-applies the admin improvements and adds a regression test that
simulates production's split DB aliases.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 8, 2026

Vulnerabilities

No security concerns identified. The new f-string SQL interpolates only self.pk (Django auto-incremented integer PK) and self.version or "NULL" (integer or SQL NULL literal) — neither is user-controlled. The inner subquery comes from Django ORM's sql_with_params(), with user-supplied values kept in the params tuple passed separately to cursor.execute(). Table names are sourced from Model._meta.db_table (compile-time constants).

Prompt To Fix All With AI
This is a comment left during a code review.
Path: posthog/models/cohort/cohort.py
Line: 727-735

Comment:
**Missing `ON CONFLICT DO NOTHING` for concurrent inserts**

The removed `UPDATE_QUERY` had `ON CONFLICT DO NOTHING` to handle the race window between the SELECT and the INSERT. The new LEFT JOIN dedup closes the race most of the time, but if two concurrent processes run the same batch simultaneously, both can pass the `WHERE cp."person_id" IS NULL` check before either inserts, and the second will hit a unique-constraint violation. The `except Exception` handler swallows it, silently skipping the entire batch.

Adding `ON CONFLICT DO NOTHING` to the tail of the INSERT restores the original behaviour at no cost:
```suggestion
                query = f"""
                    INSERT INTO "{cohort_people_table}" ("person_id", "cohort_id", "version")
                    SELECT p."id", {self.pk}, {self.version or "NULL"}
                    FROM ({sql}) AS p
                    LEFT JOIN "{cohort_people_table}" AS cp
                        ON cp."person_id" = p."id" AND cp."cohort_id" = {self.pk}
                    WHERE cp."person_id" IS NULL
                    ON CONFLICT DO NOTHING
                """
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(cohorts): fix cross-DB alias error i..." | Re-trigger Greptile

Defense-in-depth for concurrent inserts, matching the original query's
behavior.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

🎭 Playwright report · View test results →

⚠️ 2 flaky tests:

  • Inline editing insight title via compact card popover (chromium)
  • Materialize view pane (chromium)

These issues are not necessarily caused by your changes.
Annoyed by this comment? Help fix flakies and failures and it'll disappear!

Copy link
Copy Markdown
Contributor

@gustavohstrassburger gustavohstrassburger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope this works.

@github-project-automation github-project-automation bot moved this from In Review to Approved in Feature Flags Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Approved

Development

Successfully merging this pull request may close these issues.

2 participants