feat: Add first round of Warpstream topics to Clickhouse#53820
feat: Add first round of Warpstream topics to Clickhouse#53820jose-sequeira wants to merge 4 commits intomasterfrom
Conversation
|
|
|
||
| CLICKHOUSE_KAFKA_NAMED_COLLECTION: str = os.getenv("CLICKHOUSE_KAFKA_NAMED_COLLECTION", "msk_cluster") | ||
| CLICKHOUSE_KAFKA_WARPSTREAM_NAMED_COLLECTION: str = os.getenv( | ||
| "CLICKHOUSE_KAFKA_WARPSTREAM_NAMED_COLLECTION", "warpstream_ingestion" | ||
| CLICKHOUSE_KAFKA_WARPSTREAM_INGESTION_NAMED_COLLECTION: str = os.getenv( | ||
| "CLICKHOUSE_KAFKA_WARPSTREAM_INGESTION_NAMED_COLLECTION", "warpstream_ingestion" |
There was a problem hiding this comment.
Silent env var rename breaks existing deployments
The environment variable key changed from CLICKHOUSE_KAFKA_WARPSTREAM_NAMED_COLLECTION to CLICKHOUSE_KAFKA_WARPSTREAM_INGESTION_NAMED_COLLECTION. Any deployment that already has CLICKHOUSE_KAFKA_WARPSTREAM_NAMED_COLLECTION set (e.g. in Kubernetes secrets or Terraform) will silently fall back to the default "warpstream_ingestion" — the old var is simply ignored. If the deployed value differs from the default, distinct_id_usage's Kafka engine table (the only current consumer) will connect to the wrong named collection without any error.
Consider reading both names and picking the old one as a fallback if the new one is unset, or coordinating the env-var rename with infra config changes in the same deployment.
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/settings/data_stores.py
Line: 327-330
Comment:
**Silent env var rename breaks existing deployments**
The environment variable key changed from `CLICKHOUSE_KAFKA_WARPSTREAM_NAMED_COLLECTION` to `CLICKHOUSE_KAFKA_WARPSTREAM_INGESTION_NAMED_COLLECTION`. Any deployment that already has `CLICKHOUSE_KAFKA_WARPSTREAM_NAMED_COLLECTION` set (e.g. in Kubernetes secrets or Terraform) will silently fall back to the default `"warpstream_ingestion"` — the old var is simply ignored. If the deployed value differs from the default, `distinct_id_usage`'s Kafka engine table (the only current consumer) will connect to the wrong named collection without any error.
Consider reading both names and picking the old one as a fallback if the new one is unset, or coordinating the env-var rename with infra config changes in the same deployment.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
@orian this variable is not defined in any chart. So it always fallbacks to the default
|
🎭 Playwright report · View test results →
These issues are not necessarily caused by your changes. |
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (1 modified, 0 added, 0 deleted) What this means:
Next steps:
|
orian
left a comment
There was a problem hiding this comment.
Plz take a look at greptile comment.
Problem
ClickHouse Kafka engine tables for
log_entries,clickhouse_app_metrics2, andclickhouse_tophogcurrently only exist for MSK. We need equivalent tables reading from the same topics via a WarpStream cluster to support the WarpStream migration.Changes
CLICKHOUSE_KAFKA_WARPSTREAM_NAMED_COLLECTIONtoCLICKHOUSE_KAFKA_WARPSTREAM_INGESTION_NAMED_COLLECTIONfor clarity, since it refers specifically to the ingestion cluster_wssuffix) to avoid conflicts with existing MSK consumer groupslog_entries,app_metrics2, andtophogthat coexist alongside the existing MSK tables and write to the same target tablesINGESTION_SMALLfor log_entries,INGESTION_MEDIUMfor app_metrics2 and tophog)How did you test this code?
👉 Stay up-to-date with PostHog coding conventions for a smoother review.
Publish to changelog?
Docs update