Skip to content

gcp_pubsub source: expected StreamingPull stream closures logged as ERROR #25151

@andylibrian

Description

@andylibrian

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

The gcp_pubsub source emits ERROR-level logs and increments component_errors_total when Google's Pub/Sub server closes a StreamingPull connection for an expected reason. The error message:

ERROR vector::internal_events::gcp_pubsub: Failed to fetch events. error=status: Unavailable, message: "The StreamingPull stream closed for an expected reason and should be recreated, which is done automatically if using Cloud Pub/Sub client libraries. Refer to https://cloud.google.com/pubsub/docs/pull#streamingpull for more information." details: [], metadata: MetadataMap { headers: {"content-disposition": "attachment"} } error_code="failed_fetching_events" error_type="request_failed" stage="receiving"

Google's documentation describes these periodic stream closures as expected behavior:

"The StreamingPull API keeps an open connection. The Pub/Sub servers recurrently close the connection after a time period to avoid a long-running sticky connection. The client library automatically reopens a StreamingPull connection."

https://cloud.google.com/pubsub/docs/pull#streamingpull

The error message itself says the closure is "for an expected reason." However, Vector treats it as a real error because translate_error() only special-cases HTTP/2-level resets via is_reset(), and this closure arrives as a gRPC-level Unavailable status which doesn't match.

Impact

  • False alerting: ERROR logs trigger monitoring alerts for routine server behavior.
  • Misleading metrics: component_errors_total is inflated, making it unreliable for detecting real problems.
  • Unnecessary retry delay: Vector waits retry_delay_secs (default 1s) before reconnecting, when an immediate reconnect would be appropriate.

Configuration

sources:
    logs_from_gcp:
      type: gcp_pubsub
      project: "my-gcp-project"
      subscription: "my-subscription"

Version

v0.55.0

Debug Output


Example Data

No response

Additional Context

Root Cause

In src/sources/gcp_pubsub.rs, translate_error() delegates to is_reset() which only detects HTTP/2-level resets by traversing the error source chain (tonic::Status → hyper::Error → h2::Error). The GCP expected closure arrives as a gRPC-level tonic::Status with Code::Unavailable, so is_reset() returns false and the error falls through to GcpPubsubReceiveError (ERROR log + counter increment).

References

https://cloud.google.com/pubsub/docs/pull#streamingpull

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions