Skip to content

fix(ingestion): mv acceptance tests to query CH directly#53825

Merged
nickbest-ph merged 2 commits intomasterfrom
nick/fix-ingestion-acceptance-tests
Apr 9, 2026
Merged

fix(ingestion): mv acceptance tests to query CH directly#53825
nickbest-ph merged 2 commits intomasterfrom
nick/fix-ingestion-acceptance-tests

Conversation

@nickbest-ph
Copy link
Copy Markdown
Contributor

Problem

The ingestion acceptance tests (a Temporal scheduled workflow that runs every 10 minutes) query ClickHouse via the PostHog HogQL API, authenticated with a personal API key. This adds an unnecessary dependency on the web tier being healthy and requires managing a PERSONAL_API_KEY credential. Since the only goal of these tests is to verify that events land in ClickHouse, we can query ClickHouse directly from the Temporal worker, which already has ClickHouse credentials via Django settings.

Changes

  • config.py: Removed personal_api_key and project_id fields, added team_id: int (env var: INGESTION_ACCEPTANCE_TEST_TEAM_ID)
  • client.py: Replaced HTTP session + HogQL API queries with sync_execute from posthog.clickhouse.client.execute. Queries now go directly to ClickHouse using real table names (events, person, person_distinct_id2) with team_id filtering. Removed all HTTP query infrastructure (_create_http_session, _execute_hogql_query, _execute_hogql_query_all, HTTP retry/resilience config)
  • activities.py, __main__.py, slack.py, terminal_report.py: Updated references from project_id to team_id
  • README.md: Updated architecture diagram, configuration table, and descriptions to reflect direct ClickHouse queries
  • Unit tests: Rewrote client query tests to mock sync_execute instead of HTTP responses, updated all Config fixtures to use team_id instead of project_id/personal_api_key

Deployment note: Env var changes required:

  • Remove: INGESTION_ACCEPTANCE_TEST_PERSONAL_API_KEY, INGESTION_ACCEPTANCE_TEST_PROJECT_ID
  • Add: INGESTION_ACCEPTANCE_TEST_TEAM_ID

How did you test this code?

All 36 unit tests pass (pytest posthog/temporal/tests/ingestion_acceptance_test/ -v). Linting passes with ruff check and ruff format --check.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 9, 2026

Vulnerabilities

No security concerns identified. Removing the personal API key from the configuration and querying ClickHouse directly (with parameterised queries and team_id scoping) reduces the attack surface compared to the previous design.

Comments Outside Diff (1)

  1. posthog/temporal/ingestion_acceptance_test/activities.py, line 24-33 (link)

    P2 Stale env-var list in docstring

    The docstring still lists INGESTION_ACCEPTANCE_TEST_PROJECT_ID and INGESTION_ACCEPTANCE_TEST_PERSONAL_API_KEY, both of which have been removed. It's missing INGESTION_ACCEPTANCE_TEST_TEAM_ID.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: posthog/temporal/ingestion_acceptance_test/activities.py
    Line: 24-33
    
    Comment:
    **Stale env-var list in docstring**
    
    The docstring still lists `INGESTION_ACCEPTANCE_TEST_PROJECT_ID` and `INGESTION_ACCEPTANCE_TEST_PERSONAL_API_KEY`, both of which have been removed. It's missing `INGESTION_ACCEPTANCE_TEST_TEAM_ID`.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: posthog/temporal/ingestion_acceptance_test/activities.py
Line: 24-33

Comment:
**Stale env-var list in docstring**

The docstring still lists `INGESTION_ACCEPTANCE_TEST_PROJECT_ID` and `INGESTION_ACCEPTANCE_TEST_PERSONAL_API_KEY`, both of which have been removed. It's missing `INGESTION_ACCEPTANCE_TEST_TEAM_ID`.

```suggestion
    Configuration is loaded from environment variables:
    - INGESTION_ACCEPTANCE_TEST_API_HOST
    - INGESTION_ACCEPTANCE_TEST_PROJECT_API_KEY
    - INGESTION_ACCEPTANCE_TEST_TEAM_ID
    - INGESTION_ACCEPTANCE_TEST_EVENT_TIMEOUT_SECONDS (optional, default 3600)
    - INGESTION_ACCEPTANCE_TEST_POLL_INTERVAL_SECONDS (optional, default 10.0)
    - INGESTION_ACCEPTANCE_TEST_ACTIVITY_TIMEOUT_SECONDS (optional, default 3600)
    - INGESTION_ACCEPTANCE_TEST_SLACK_WEBHOOK_URL (optional, for Slack notifications)
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: posthog/temporal/ingestion_acceptance_test/client.py
Line: 271-293

Comment:
**Transient ClickHouse errors are no longer tolerated during polling**

The old `_poll_until_found` caught `requests.exceptions.RequestException` and logged a warning so polling could continue on transient network failures. The new code calls `fetch_fn()` (which calls `sync_execute`) with no exception handling — any transient ClickHouse error (connection reset, query timeout, etc.) will propagate out of the polling loop and fail the test immediately.

For a production monitoring workflow that's meant to distinguish real ingestion failures from infra blips, this regression turns every transient ClickHouse hiccup into a false positive. Consider wrapping the `fetch_fn()` call in a try/except that catches ClickHouse / database exceptions and logs a warning before continuing the loop.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: posthog/temporal/ingestion_acceptance_test/client.py
Line: 335-365

Comment:
**Missing `p.is_deleted = 0` filter on the `person` table**

`person` is a `ReplacingMergeTree`. Before ClickHouse merges parts, both the original and the replacement row (with `is_deleted = 1` and a higher version) coexist. `ORDER BY p.version DESC LIMIT 1` will therefore return the soft-deleted row when the merge hasn't run yet, causing the query to return a `Person` object that should be treated as absent.

The standard pattern in this codebase is to add `AND p.is_deleted = 0` (or use `FINAL`) to exclude logically-deleted records:

```sql
WHERE p.team_id = %(team_id)s
  AND pdi.distinct_id = %(distinct_id)s
  AND pdi.is_deleted = 0
  AND p.is_deleted = 0
ORDER BY p.version DESC
LIMIT 1
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(ingestion): mv acceptance tests to q..." | Re-trigger Greptile

@nickbest-ph nickbest-ph merged commit 76bc069 into master Apr 9, 2026
200 checks passed
@nickbest-ph nickbest-ph deleted the nick/fix-ingestion-acceptance-tests branch April 9, 2026 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants