Skip to content

E2E integration test reliability and worker logging gaps #135

Description

@mihow

Context

Found during E2E testing of RolnickLab/antenna#1197 + #134.

Issues

1. psv2_integration_test.sh populate step fails intermittently

The api_post_empty call to /captures/collections/{id}/populate/ sometimes fails with curl -sf (exit code 22), even though the endpoint returns 200 when called manually moments later. The set -euo pipefail causes the entire script to abort.

Likely cause: race condition between collection creation and populate, or a transient connection issue with curl -sf being too strict.

Suggested fix: add a short retry loop around the populate call, or replace curl -sf with a function that retries on transient failures.

2. Worker "Done" summary lines missing from log files

When the worker spawns per-GPU subprocesses (Found 2 GPUs, spawning one AMI worker instance per GPU), the batch completion summary lines (Done, detections: N. Detecting time: ...) only appear in the subprocess that processed the batch. When redirecting to a log file, these lines sometimes don't appear because the parent process's log stream captures only its own output.

The psv2_integration_test.sh script greps for Done, detections: in the worker log to show timing, but this is unreliable with multi-GPU workers.

3. Worker log analysis section in test script sometimes skipped

The integration test script uses set -euo pipefail. If grep finds no matches (e.g., no errors in logs), it returns exit code 1, which causes the script to exit before printing the final PASS/FAIL verdict. This makes clean runs report as failures.

4. POST URLs require trailing slash — easy to miss

Django's APPEND_SLASH can silently redirect GET requests but returns a 500 for POST requests without a trailing slash. This caused the first E2E test failure. The error message is clear in the Django logs but the worker only sees a generic 500.

Suggestion: document this in the ADC's CLAUDE.md or add a note in the Antenna API docs. Alternatively, the ADC HTTP client could normalize URLs to always include a trailing slash.

Affected files

  • scripts/psv2_integration_test.sh (Antenna repo)
  • trapdata/antenna/datasets.py (trailing slash)
  • Worker subprocess logging infrastructure

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions