golang: replace per-cgo-call destroy mutex with std::atomic<bool> by wdauchy · Pull Request #44662 · envoyproxy/envoy

wdauchy · 2026-04-26T09:05:59Z

Commit Message:
Every CAPI call from Go into the C++ filter (getHeader, copyHeaders, setHeader, addData, continueStatus, ...) was acquiring Filter::mutex_ purely to read the has_destroyed_ flag. Off-thread Go callers paid the cost twice: once at the entry point and again inside the dispatcher-post lambda's hasDestroyed() check. On uncontended fast paths this is two atomic compare-and-swap operations and a memory fence per call, on top of the cgo boundary cost that's already paid for every API call.

Make has_destroyed_ a std::atomic with release-store on the single onDestroy() write and acquire-load on the read path. This turns the destroy check into a plain atomic load (free on x86, an ldar on ARM) and removes the mutex acquisition from 18 CAPI entry points and ~12 dispatcher-post lambdas. The Filter is kept alive across cgo calls by the shared_ptr taken at the cgo wrapper layer (cgo.cc), so seeing a false-then-true transition mid-call is benign; the lambdas all also check weak_from_this() expiration.

Filter::mutex_ is retained on the five CAPI methods that write to req_->strValue (getStringValue, getDynamicMetadata, getStringFilterState, getStringProperty, getSecret) to keep the existing serialization between off-thread Go callers writing to the shared scratch buffer.

No behaviour change for unit/integration tests; existing onDestroy() ordering and the post-destroy CAPIFilterIsDestroy semantics are preserved.

Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

repokitteh-read-only · 2026-04-26T09:06:04Z

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #44662 was opened by wdauchy.

see: more, trace.

Every CAPI call from Go into the C++ filter (getHeader, copyHeaders, setHeader, addData, continueStatus, ...) was acquiring Filter::mutex_ purely to read the has_destroyed_ flag. Off-thread Go callers paid the cost twice: once at the entry point and again inside the dispatcher-post lambda's hasDestroyed() check. On uncontended fast paths this is two atomic compare-and-swap operations and a memory fence per call, on top of the cgo boundary cost that's already paid for every API call. Make has_destroyed_ a std::atomic<bool> with release-store on the single onDestroy() write and acquire-load on the read path. This turns the destroy check into a plain atomic load (free on x86, an ldar on ARM) and removes the mutex acquisition from 18 CAPI entry points and ~12 dispatcher-post lambdas. The Filter is kept alive across cgo calls by the shared_ptr taken at the cgo wrapper layer (cgo.cc), so seeing a false-then-true transition mid-call is benign; the lambdas all also check weak_from_this() expiration. Filter::mutex_ is retained on the five CAPI methods that write to req_->strValue (getStringValue, getDynamicMetadata, getStringFilterState, getStringProperty, getSecret) to keep the existing serialization between off-thread Go callers writing to the shared scratch buffer. No behaviour change for unit/integration tests; existing onDestroy() ordering and the post-destroy CAPIFilterIsDestroy semantics are preserved. Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>

ravenblackx · 2026-04-27T15:23:05Z

+  // CAPIFilterIsDestroy as soon as they observe the store. The Filter itself is kept alive across
+  // any cgo call by the shared_ptr taken at the cgo wrapper layer (see cgo.cc), so observing a
+  // false-then-true transition during a call is benign.


This comment is very helpful. :)

ravenblackx · 2026-04-27T15:25:53Z

Assigning a couple of the golang extension owners to review. Once either of them approves, I'll merge it.

doujiang24 · 2026-04-28T03:04:44Z


 CAPIStatus Filter::getHeader(ProcessorState& state, absl::string_view key, uint64_t* value_data,
                             int* value_len) {
-  Thread::LockGuard lock(mutex_);


I'm not sure it's safe enough. The Filter is alive due to sharedptr, but, does it means it's safe to read the headers in another Go thread? I'm not sure about it.

Will this happen? Filter is alive, but headers is freed by onDestroy pipeline.

Good question, and it's worth being explicit about it. Short version: this PR doesn't change the lifetime invariant for state.headers; the previous mutex_ never participated in that invariant either.

What mutex_ was protecting before was strictly the read of has_destroyed_, with the ABSL_GUARDED_BY annotation only on that flag. Once the lock was released at the end of the entry function (or in the middle of continueStatus/sendLocalReply), the off-thread Go thread continued into m->get(LowerCaseString(key)) without holding any lock that would prevent the header map from being freed by the worker thread. So the scenario you describe, "Filter alive, headers freed by onDestroy pipeline, off-thread Go thread reads m", was already not synchronised by mutex_ in the old code; it's synchronised by a different invariant entirely.

The invariant that protects state.headers from a use-after-free is:

state.headers = &headers is set in Filter::doHeaders (worker thread), and cleared (state.headers = nullptr) only when Go returns synchronously with a non-Running status, i.e. done == true. While done == false (Go went async), the worker thread parks the stream and state.headers stays valid.

All CAPI entry points additionally check state.isProcessingInGo(), which returns true exactly when filterState() is ProcessingHeader/ProcessingData/ProcessingTrailer, i.e. the same window during which state.headers is non-null.

While processing is active, the filter chain holds the header map alive: onDestroy() is only invoked by the chain when the stream is being torn down. When that happens during async processing, the C++ side calls envoyGoFilterOnHttpDestroy which on the Go side runs req.resumeWaitCallback(), waking any off-thread goroutine blocked on the callback condition. The Go side then bails with CAPIFilterIsDestroy on its next CAPI return.

So the off-thread call window is: "between state.isProcessingInGo() returning true and the actual m->get(...) dereference, can the header map go away?" The answer is no, because the worker thread has not yet returned from the doHeaders/doData/doTrailer parking and therefore has not yet entered any code that would free the header map for this stream. A reset that arrives during async processing causes onDestroy() to flip has_destroyed_ and notify Go, but it does not synchronously free the header map; that happens later in the request lifecycle, after the parked stream has been unwound. Any Go thread that was racing in getHeader either:

already loaded m and is reading from a still-live header map, or

has not yet loaded m, will see has_destroyed_ == true on the next attempt, and will get CAPIFilterIsDestroy.

The atomic has_destroyed_ check is sufficient for the flag race; the header map lifetime race is governed by the parking semantics above and was not improved or weakened by this PR.

That said, I think it's worth making this explicit in the code. I can add a comment block above getHeader (or, more generally, near state.isProcessingInGo() in processor_state.h) describing this invariant so future readers don't have to reconstruct it. Happy to push that in this PR if you want, just let me know which placement you prefer.

Thanks for the detailed explanation! I appreciate you walking through the invariants.
After reading the codebase, found there might be a use-after-free:

off-thread goroutine worker thread -------------------------- ---------------------------- cgo entry: weak_filter.lock() → shared_ptr<Filter> filter->getHeader(state, key, ...) hasDestroyed() → false isProcessingInGo() → true m = state.headers (raw ptr, valid) m->get(LowerCaseString(key)) ... OS preempts goroutine here ... stream reset arrives doDeferredStreamDestroy(): destroyFilters(): Filter::onDestroy(): has_destroyed_.exchange(true) ← no lock, no block envoyGoFilterOnHttpDestroy() → Go: OnDestroy onDestroy returns destroyFilters returns deferredDelete(stream) ... other event callbacks ... event loop tail: process deferred-delete queue ActiveStream destructs → HeaderMap freed ← !! ... goroutine resumes ... reading freed HeaderMap memory → UAF

The critical point: we cannot assume m->get() is fast. The off-thread goroutine runs on a Go runtime thread, subject to OS scheduling. It can be preempted at any instruction boundary, and the entire event loop iteration (including deferred deletes) can complete before it resumes.

With the old mutex_, the worker thread would have blocked at step onDestroy() → LockGuard lock(mutex_), stalling the entire destruction pipeline until the goroutine finished reading. That protection is removed with the atomic.

The atomic conversion is correct and beneficial for the majority of CAPI methods — especially the write paths that dispatcher.post() to the worker thread, and methods that only touch Filter-owned state. I'd suggest keeping mutex_ only for the small set of methods that inline-dereference Envoy-owned objects from off-thread: getHeader, copyHeaders, copyTrailers, and getIntegerValue. This preserves most of the performance win while closing the lifetime gap.

Happy to discuss further — it's a subtle corner and I want to make sure we get it right.

You're right, sorry for the misleading reply earlier. I missed that the old getHeader held mutex_ across m->get(...) itself, not just the destroy-flag check, which is exactly what makes the worker-thread stall on onDestroy()'s LockGuard a real lifetime barrier. Your timeline is correct: without that stall, the goroutine can be preempted between loading m = state.headers and dereferencing it, and the worker can run through destroyFilters -> deferred-delete -> ~ActiveStream -> header-map free in the meantime.

Pushed a follow-up commit that restores the lock on the four methods you named (getHeader, copyHeaders, copyTrailers, getIntegerValue), with inline comments explaining the worker-stall role at each callsite, and a longer block on mutex_'s declaration documenting its two distinct roles (strValue serialisation + worker-stall fence). Changelog narrowed accordingly. Everything else stays lock-free: writes that go through dispatcher.post, buffer methods on Filter-owned doDataList, the worker-thread inline branches, and the dispatcher-post lambdas' hasDestroyed() re-check.

Happy to move the longer comment to processor_state.h near isProcessingInGo() if you'd prefer it there.

The previous commit dropped Filter::mutex_ from every CAPI entry point, on the basis that has_destroyed_ no longer needed lock-based protection. That reasoning was incomplete: for CAPI methods that inline-dereference Envoy-owned objects whose lifetime is tied to the parent stream (rather than to the Filter), the mutex was not just guarding the destroy flag, it was also stalling the worker thread inside Filter::onDestroy() until every in-flight off-thread reader had unwound. Without that stall, the worker thread can return from onDestroy(), continue past destroyFilters(), reach deferredDelete(stream), and at the event-loop tail destruct ActiveStream and free the HeaderMap / TrailerMap / StreamInfo while an off-thread Go goroutine is mid-dereference. The goroutine is just a regular OS-scheduled thread and can be preempted at any instruction boundary, so this is reachable in practice. Restore the lock on the four CAPI methods that perform such inline dereferences from off-thread: - getHeader: reads state.headers (Http::HeaderMap) - copyHeaders: iterates state.headers - copyTrailers: iterates state.trailers (Http::HeaderMap) - getIntegerValue: reads streamInfo() and SSL/upstream sub-objects All other entry points remain lock-free: writes that go through dispatcher.post (setHeader, removeHeader, setTrailer, removeTrailer, setUpstreamOverrideHost, setDynamicMetadata, setStringFilterState, clearRouteCache), worker-thread-only inline branches (addData when isThreadSafe), buffer methods that operate on Filter-owned doDataList (copyBuffer, drainBuffer, setBufferHelper), and continueStatus / sendLocalReply / setDrainConnectionUponCompletion. The five strValue writers (getStringValue, getDynamicMetadata, getStringFilterState, getStringProperty, getSecret) keep the lock for both their original strValue serialisation purpose and the worker-stall purpose. Document mutex_'s two distinct roles on its declaration in the header, add inline rationale comments at each of the four restored callsites, and narrow the changelog entry to reflect the actual scope of the optimisation. Reported-by: doujiang24 <doujiangduoduo@gmail.com> Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>

…ter-atomic-destroyed

wdauchy · 2026-04-28T14:58:05Z

/retest transients

doujiang24 · 2026-04-28T16:07:27Z

  }

-  {
-    Thread::LockGuard lock(mutex_);


We still need it in onDestory? to prevent headers be freed.

wdauchy marked this pull request as ready for review April 26, 2026 09:24

ravenblackx previously approved these changes Apr 27, 2026

View reviewed changes

ravenblackx assigned ravenblackx, doujiang24 and wangfakang Apr 27, 2026

doujiang24 reviewed Apr 28, 2026

View reviewed changes

wdauchy dismissed ravenblackx’s stale review via 814a7b5 April 28, 2026 13:15

wdauchy force-pushed the optimize/golang-filter-atomic-destroyed branch from 6919563 to 814a7b5 Compare April 28, 2026 13:15

Merge remote-tracking branch 'upstream/main' into optimize/golang-fil…

40ed8d0

…ter-atomic-destroyed

doujiang24 reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

golang: replace per-cgo-call destroy mutex with std::atomic<bool>#44662

golang: replace per-cgo-call destroy mutex with std::atomic<bool>#44662
wdauchy wants to merge 3 commits intoenvoyproxy:mainfrom
wdauchy:optimize/golang-filter-atomic-destroyed

wdauchy commented Apr 26, 2026

Uh oh!

repokitteh-read-only Bot commented Apr 26, 2026

Uh oh!

ravenblackx Apr 27, 2026

Uh oh!

ravenblackx commented Apr 27, 2026

Uh oh!

doujiang24 Apr 28, 2026

Uh oh!

wdauchy Apr 28, 2026 •

edited

Loading

Uh oh!

doujiang24 Apr 28, 2026

Uh oh!

wdauchy Apr 28, 2026

Uh oh!

wdauchy commented Apr 28, 2026

Uh oh!

doujiang24 Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

wdauchy commented Apr 26, 2026

Uh oh!

repokitteh-read-only Bot commented Apr 26, 2026

Uh oh!

ravenblackx Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

ravenblackx commented Apr 27, 2026

Uh oh!

doujiang24 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

wdauchy Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

doujiang24 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

wdauchy Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

wdauchy commented Apr 28, 2026

Uh oh!

doujiang24 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wdauchy Apr 28, 2026 •

edited

Loading