feat: Add first round of Warpstream topics to Clickhouse by jose-sequeira · Pull Request #53820 · PostHog/posthog

jose-sequeira · 2026-04-09T06:43:40Z

Problem

ClickHouse Kafka engine tables for log_entries, clickhouse_app_metrics2, and clickhouse_tophog currently only exist for MSK. We need equivalent tables reading from the same topics via a WarpStream cluster to support the WarpStream migration.

Changes

Renamed CLICKHOUSE_KAFKA_WARPSTREAM_NAMED_COLLECTION to CLICKHOUSE_KAFKA_WARPSTREAM_INGESTION_NAMED_COLLECTION for clarity, since it refers specifically to the ingestion cluster
Added WarpStream consumer group constants (_ws suffix) to avoid conflicts with existing MSK consumer groups
Added WarpStream Kafka engine tables and materialized views for log_entries, app_metrics2, and tophog that coexist alongside the existing MSK tables and write to the same target tables
Added ClickHouse migration 0227 to create all 6 new tables, matching the node roles of their MSK counterparts (INGESTION_SMALL for log_entries, INGESTION_MEDIUM for app_metrics2 and tophog)

How did you test this code?

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Publish to changelog?

Docs update

greptile-apps · 2026-04-09T06:47:35Z

Vulnerabilities

No security concerns identified. The new tables read from Kafka topics using named collections (no credentials in SQL), consumer groups are namespaced to avoid conflicts, and the changes follow the same auth/connection patterns as the existing MSK tables.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: posthog/settings/data_stores.py
Line: 327-330

Comment:
**Silent env var rename breaks existing deployments**

The environment variable key changed from `CLICKHOUSE_KAFKA_WARPSTREAM_NAMED_COLLECTION` to `CLICKHOUSE_KAFKA_WARPSTREAM_INGESTION_NAMED_COLLECTION`. Any deployment that already has `CLICKHOUSE_KAFKA_WARPSTREAM_NAMED_COLLECTION` set (e.g. in Kubernetes secrets or Terraform) will silently fall back to the default `"warpstream_ingestion"` — the old var is simply ignored. If the deployed value differs from the default, `distinct_id_usage`'s Kafka engine table (the only current consumer) will connect to the wrong named collection without any error.

Consider reading both names and picking the old one as a fallback if the new one is unset, or coordinating the env-var rename with infra config changes in the same deployment.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: posthog/models/tophog/sql.py
Line: 145-164

Comment:
**Duplicate MV function (OnceAndOnlyOnce)**

`TOPHOG_WS_MV_SQL` and `TOPHOG_MV_SQL` are identical except for which name constants they inject (`WS_MV_NAME`/`KAFKA_WS_TABLE_NAME` vs `MV_NAME`/`KAFKA_TABLE_NAME`). Both functions could be collapsed into the existing one by accepting `mv_name` and `kafka_table` parameters, eliminating the duplication entirely. The same pattern applies to `LOG_ENTRIES_WS_MV_SQL` vs `LOG_ENTRIES_V3_TABLE_MV_SQL` in `log_entries.py`, and `APP_METRICS2_WS_MV_TABLE_SQL` vs `APP_METRICS2_MV_TABLE_SQL` in `app_metrics2/sql.py`.

```python
# Example refactor for tophog:
def TOPHOG_MV_SQL(
    target_table: str = WRITABLE_TABLE_NAME,
    mv_name: str = MV_NAME,
    kafka_table: str = KAFKA_TABLE_NAME,
):
    return """
CREATE MATERIALIZED VIEW IF NOT EXISTS {mv_name}
TO {target_table}
AS SELECT ...
FROM {kafka_table}
""".format(mv_name=mv_name, target_table=target_table, kafka_table=kafka_table)

# WS variant becomes a one-liner:
TOPHOG_WS_MV_SQL = lambda target_table=WRITABLE_TABLE_NAME: TOPHOG_MV_SQL(
    target_table=target_table, mv_name=WS_MV_NAME, kafka_table=KAFKA_WS_TABLE_NAME
)
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "feat: Add first round of Warpstream topi..." | Re-trigger Greptile}

greptile-apps · 2026-04-09T06:47:39Z

posthog/settings/data_stores.py


 CLICKHOUSE_KAFKA_NAMED_COLLECTION: str = os.getenv("CLICKHOUSE_KAFKA_NAMED_COLLECTION", "msk_cluster")
-CLICKHOUSE_KAFKA_WARPSTREAM_NAMED_COLLECTION: str = os.getenv(
-    "CLICKHOUSE_KAFKA_WARPSTREAM_NAMED_COLLECTION", "warpstream_ingestion"
+CLICKHOUSE_KAFKA_WARPSTREAM_INGESTION_NAMED_COLLECTION: str = os.getenv(
+    "CLICKHOUSE_KAFKA_WARPSTREAM_INGESTION_NAMED_COLLECTION", "warpstream_ingestion"


Silent env var rename breaks existing deployments

The environment variable key changed from CLICKHOUSE_KAFKA_WARPSTREAM_NAMED_COLLECTION to CLICKHOUSE_KAFKA_WARPSTREAM_INGESTION_NAMED_COLLECTION. Any deployment that already has CLICKHOUSE_KAFKA_WARPSTREAM_NAMED_COLLECTION set (e.g. in Kubernetes secrets or Terraform) will silently fall back to the default "warpstream_ingestion" — the old var is simply ignored. If the deployed value differs from the default, distinct_id_usage's Kafka engine table (the only current consumer) will connect to the wrong named collection without any error.

Consider reading both names and picking the old one as a fallback if the new one is unset, or coordinating the env-var rename with infra config changes in the same deployment.

Prompt To Fix With AI

This is a comment left during a code review. Path: posthog/settings/data_stores.py Line: 327-330 Comment: **Silent env var rename breaks existing deployments** The environment variable key changed from `CLICKHOUSE_KAFKA_WARPSTREAM_NAMED_COLLECTION` to `CLICKHOUSE_KAFKA_WARPSTREAM_INGESTION_NAMED_COLLECTION`. Any deployment that already has `CLICKHOUSE_KAFKA_WARPSTREAM_NAMED_COLLECTION` set (e.g. in Kubernetes secrets or Terraform) will silently fall back to the default `"warpstream_ingestion"` — the old var is simply ignored. If the deployed value differs from the default, `distinct_id_usage`'s Kafka engine table (the only current consumer) will connect to the wrong named collection without any error. Consider reading both names and picking the old one as a fallback if the new one is unset, or coordinating the env-var rename with infra config changes in the same deployment. How can I resolve this? If you propose a fix, please make it concise.

@orian this variable is not defined in any chart. So it always fallbacks to the default

posthog/models/tophog/sql.py

github-actions · 2026-04-09T07:01:48Z

🎭 Playwright report · View test results →

⚠️ 1 flaky test:

Materialize view pane (chromium)

These issues are not necessarily caused by your changes.
Annoyed by this comment? Help fix flakies and failures and it'll disappear!

tests-posthog · 2026-04-09T07:05:29Z

Query snapshots: Backend query snapshots updated

Changes: 1 snapshots (1 modified, 0 added, 0 deleted)

What this means:

Query snapshots have been automatically updated to match current output
These changes reflect modifications to database queries or schema

Next steps:

Review the query changes to ensure they're intentional
If unexpected, investigate what caused the query to change

Review snapshot changes →

orian

Plz take a look at greptile comment.

feat: Add first round of Warpstream topics to Clickhouse

69b83b5

jose-sequeira requested a review from a team as a code owner April 9, 2026 06:43

greptile-apps bot reviewed Apr 9, 2026

View reviewed changes

test(backend): update query snapshots

6c5e98f

fix max migration

9fe96aa

nickbest-ph approved these changes Apr 9, 2026

View reviewed changes

Merge branch 'master' into jose-sequeira/warpstream-kafka-engines

93818ed

orian approved these changes Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add first round of Warpstream topics to Clickhouse#53820

feat: Add first round of Warpstream topics to Clickhouse#53820
jose-sequeira wants to merge 4 commits intomasterfrom
jose-sequeira/warpstream-kafka-engines

jose-sequeira commented Apr 9, 2026

Uh oh!

greptile-apps bot commented Apr 9, 2026

Vulnerabilities

Uh oh!

greptile-apps bot Apr 9, 2026

Uh oh!

jose-sequeira Apr 9, 2026

Uh oh!

Uh oh!

github-actions bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

tests-posthog bot commented Apr 9, 2026

Uh oh!

orian left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jose-sequeira commented Apr 9, 2026

Problem

Changes

How did you test this code?

Publish to changelog?

Docs update

Uh oh!

greptile-apps bot commented Apr 9, 2026

Vulnerabilities

Uh oh!

greptile-apps bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

jose-sequeira Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tests-posthog bot commented Apr 9, 2026

Query snapshots: Backend query snapshots updated

Uh oh!

orian left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Apr 9, 2026 •

edited

Loading