A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
The gcp_pubsub source emits ERROR-level logs and increments component_errors_total when Google's Pub/Sub server closes a StreamingPull connection for an expected reason. The error message:
ERROR vector::internal_events::gcp_pubsub: Failed to fetch events. error=status: Unavailable, message: "The StreamingPull stream closed for an expected reason and should be recreated, which is done automatically if using Cloud Pub/Sub client libraries. Refer to https://cloud.google.com/pubsub/docs/pull#streamingpull for more information." details: [], metadata: MetadataMap { headers: {"content-disposition": "attachment"} } error_code="failed_fetching_events" error_type="request_failed" stage="receiving"
Google's documentation describes these periodic stream closures as expected behavior:
"The StreamingPull API keeps an open connection. The Pub/Sub servers recurrently close the connection after a time period to avoid a long-running sticky connection. The client library automatically reopens a StreamingPull connection."
https://cloud.google.com/pubsub/docs/pull#streamingpull
The error message itself says the closure is "for an expected reason." However, Vector treats it as a real error because translate_error() only special-cases HTTP/2-level resets via is_reset(), and this closure arrives as a gRPC-level Unavailable status which doesn't match.
Impact
- False alerting: ERROR logs trigger monitoring alerts for routine server behavior.
- Misleading metrics: component_errors_total is inflated, making it unreliable for detecting real problems.
- Unnecessary retry delay: Vector waits retry_delay_secs (default 1s) before reconnecting, when an immediate reconnect would be appropriate.
Configuration
sources:
logs_from_gcp:
type: gcp_pubsub
project: "my-gcp-project"
subscription: "my-subscription"
Version
v0.55.0
Debug Output
Example Data
No response
Additional Context
Root Cause
In src/sources/gcp_pubsub.rs, translate_error() delegates to is_reset() which only detects HTTP/2-level resets by traversing the error source chain (tonic::Status → hyper::Error → h2::Error). The GCP expected closure arrives as a gRPC-level tonic::Status with Code::Unavailable, so is_reset() returns false and the error falls through to GcpPubsubReceiveError (ERROR log + counter increment).
References
https://cloud.google.com/pubsub/docs/pull#streamingpull
A note for the community
Problem
The gcp_pubsub source emits ERROR-level logs and increments component_errors_total when Google's Pub/Sub server closes a StreamingPull connection for an expected reason. The error message:
Google's documentation describes these periodic stream closures as expected behavior:
https://cloud.google.com/pubsub/docs/pull#streamingpull
The error message itself says the closure is "for an expected reason." However, Vector treats it as a real error because translate_error() only special-cases HTTP/2-level resets via is_reset(), and this closure arrives as a gRPC-level Unavailable status which doesn't match.
Impact
Configuration
Version
v0.55.0
Debug Output
Example Data
No response
Additional Context
Root Cause
In
src/sources/gcp_pubsub.rs,translate_error()delegates tois_reset()which only detects HTTP/2-level resets by traversing the error source chain(tonic::Status → hyper::Error → h2::Error). The GCP expected closure arrives as a gRPC-leveltonic::Status with Code::Unavailable, sois_reset()returns false and the error falls through toGcpPubsubReceiveError(ERROR log + counter increment).References
https://cloud.google.com/pubsub/docs/pull#streamingpull