Optimize isSwiftUI() by replacing symbol resolution with segment address checking#165
Conversation
…ess checking Replace Thread.callStackSymbols (which calls expensive findClosestSymbol) with direct Mach-O segment parsing and address range checking. This avoids symbol resolution entirely while maintaining correctness. Ref: https://mjtsai.com/blog/2025/10/03/spamsieve-3-2-1/
| let start = UInt(bitPattern: Int(segment.vmaddr) + slide) | ||
| let end = UInt(bitPattern: Int(segment.vmaddr + segment.vmsize) + slide) |
There was a problem hiding this comment.
Do we need to convert to UInt before arithmetic to avoid overflow? (Ditto below)
There was a problem hiding this comment.
The bitPattern init takes an Int, so I think this is OK.
| var attributeGraphRanges: [(UInt, UInt)] = [] | ||
| let imageCount = _dyld_image_count() | ||
| for i in 0..<imageCount { |
There was a problem hiding this comment.
Should attributeGraphRanges be cached for reuse across multiple locations?
There was a problem hiding this comment.
I pushed a quick refactor to cache things.
|
@robmaceachern Did a cleanup/optimization pass. Want to take things for another spin to make sure I didn't mess anything up? |
| } | ||
| } | ||
|
|
||
| private let attributeGraphAddresses: RangeSet = { |
There was a problem hiding this comment.
some thoughts on the caching:
- is it conceivable that this could ever return an empty range but SwiftUI/AG will show up at some future point? like is there a circumstance where we get to this function's executing but somehow the library we're looking for has not yet been loaded? i imagine it's unlikely if at all possible...
- if that is possible, perhaps we could force it to load by explicitly looking up a symbol via
dladdr()or something? - can the address range data be cached globally, rather than per-instance?
There was a problem hiding this comment.
I think 1 is unlikely but I'm not sure it can be totally ruled out. I don't know if we can safely force load it since afaik none of the symbols are public.
can the address range data be cached globally, rather than per-instance?
I think that's probably a good idea.
There was a problem hiding this comment.
can the address range data be cached globally, rather than per-instance?
Actually, I think it already is! The indentation kind of threw me off for a second but it's not nested inside PerceptionRegistrar.
| let imageCount = _dyld_image_count() | ||
| for i in 0..<imageCount { | ||
| guard let imageName = _dyld_get_image_name(i) else { continue } | ||
| if String(cString: imageName).hasSuffix("/AttributeGraph") { |
There was a problem hiding this comment.
i know it wasn't changed here, but why look for AG specifically vs something in SwiftUI/SwiftUICore?
There was a problem hiding this comment.
Just guessing: probably an issue with additional false-positives in the non-View bits of SwiftUI.
There was a problem hiding this comment.
This debug code is specifically for letting folks know when a property is accessed from a SwiftUI view that was not wrapped in the WithPerceptionTracking observable view.
There was a problem hiding this comment.
edit: replied to the wrong thread... moving the comment
|
|
||
| private let attributeGraphAddresses: RangeSet = { | ||
| var addresses = RangeSet() | ||
| let imageCount = _dyld_image_count() |
There was a problem hiding this comment.
i see this in the header for many of these methods:
/*
* The following functions allow you to iterate through all loaded images.
* This is not a thread safe operation. Another thread can add or remove
* an image during the iteration.
*
* Many uses of these routines can be replace by a call to dladdr() which
* will return the mach_header and name of an image, given an address in
* the image. dladdr() is thread safe.
*/
not sure if the slide value is accessible via other means though...
There was a problem hiding this comment.
Yeah I guess it's a risk. Maybe the AttributeGraph image could get pushed out of the imageCount range and our checks would be wrong (and cached wrong forever). Seems like low probability and low impact though.
There was a problem hiding this comment.
@jamieQ We're open to improvements here, but also this is debug-only code that can be disabled, so we're not sure there's too much of a risk here, especially since AttributeGraph is likely loaded early and forever in a SwiftUI application.
There was a problem hiding this comment.
that makes sense. if we're not too concerned with the risks here this seems like this might be fine. it does feel like we should be able to make the runtime do most of this work for us in some manner (e.g. dlopen the private framework, dlsym a symbol name we expect to 'always' exist in AttributeGraph or something along those lines), but maybe this is good enough.
i think the bigger concern than a logical race would be if there is potential for causing data races or crashes via use of these API. e.g. the OSS distributions of dyld suggest there's some risk that reading from the underlying image vector via the _dyld_image_count() API could race on the underlying storage if it were to be resized concurrently, though TBH i'm not familiar enough with C++ to have a sense of how much of a concern this might be in practice. since it looks like the implementation defends against indexing into the loaded images vector with a bad value, that failure mode is presumably eliminated (which was originally my primary worry).
mostly out of curiosity, i messed around a bit with some of the other dyld APIs, and i think there are at least two alternatives that could work to sidestep the thread safety risks. the first is to use dlopen + dlsym + dladdr to get the mach header of the private library we want to derive the address ranges for. something like:
// get a handle to the library that we expect to exist. don't load it if it isn't yet (unlikely)
let expectedAGPath = "/System/Library/PrivateFrameworks/AttributeGraph.framework/AttributeGraph"
guard let handle = dlopen(expectedAGPath, RTLD_NOLOAD) else {
return nil
}
defer { dlclose(handle) }
// look up a symbol we think should exist in the library
guard let symbol = dlsym(handle, "AGGraphCreate") else {
return nil
}
// get address info for the symbol, which will give use the base address of the library
var symbolInfo = Dl_info()
guard dladdr(symbol, &symbolInfo) != 0 else { // 0 is failure
return nil
}
guard let baseAddress = symbolInfo.dli_fbase else {
return nil
}
// Parse Mach-O header to get the size
let header = baseAddress.assumingMemoryBound(to: mach_header_64.self)
// some similar logic as below to derive the address range from the header
// ...the second would be to use the _dyld_register_func_for_add_image() function to immediately get callbacks with the mach headers and slide values for all currently loaded images. within the implementation of the callback function we'd presumably have to do something like a dladdr() of the mach header we're passed, and then perform logic similar to what we have here (skip the library names we don't care about, then derive and cache the address ranges we want to handle). i didn't test this one too much, but i think it would be structurally quite similar to the current approach, but with fewer underscored API calls. once the callback is registered i don't think it can be unregistered though, so there isn't really a way to 'exit early' once you set it up, and it will continue to get callbacks if future images are loaded during execution (so the implementation would have to deal with that).
one additional thought on the ASLR 'slide' values... anecdotally, when testing, the slide values for stuff from the 'dyld shared cache' seemed to be reported as zero (which is where i would expect all the system framework stuff to be on Darwin targets). i'm not sure exactly what that means or implies though TBH. if we need to figure out the slide value and can't just assume it will be zero in this case, the first approach might not work, if we do, the second one should provide the value in the callback function.
There was a problem hiding this comment.
Thanks for looking into things!
We'd be down for the first approach if you're open to PR it. My main question about the snippet above is if the code should eager/lazy-load AttributeGraph as a precaution, rather than avoiding it via RTLD_NOLOAD, since these addresses will be cached (unless you're suggesting always loading the addresses fresh each isSwiftUI check?).
We've decided to merge this PR as is since it's a huge improvement to developer QoL, but don't take that to mean we're not interested in more improvements down the line.
|
I don't think there's any rush to merge/release, so I'll let things simmer over the weekend. If anyone wants to suggest any more improvements (@jamieQ?) please do! |
Optimize
isSwiftUI()performance by avoiding expensive symbol resolutionWarning
This is just for discussion at this point. This was implemented with an AI agent and I haven't fully digested it, but it does seem to avoid the severe performance issues we were seeing and all existing iOS tests pass. macOS tests fail but they also fail on main today.
AI-generated summary in details.
Details
Summary
Profiling revealed that
isSwiftUI()was a performance bottleneck due to expensive symbol resolution calls. This PR replaces the symbol-based approach with direct memory address checking against Mach-O segment ranges, providing significant performance improvements while maintaining correctness.Problem
The
isSwiftUI()method detects whether code is being called from SwiftUI's AttributeGraph rendering pipeline (to warn developers about untracked state access). The original implementation worked like this:Why This Was Slow
When you call
Thread.callStackSymbols, Swift needs to convert memory addresses into human-readable function names. This triggers a chain of expensive operations:backtrace_symbols()- System call to symbolicate addressesdladdr()- Looks up which library owns each addressfindClosestSymbol()- Scans symbol tables to find function namesProfiling showed that
findClosestSymbol()was the bottleneck, as it needs to search through potentially thousands of symbols for every stack frame. This happened on every property access in debug builds, making the app noticeably slower.Solution
Instead of converting addresses to symbol names, we can check addresses directly against the memory ranges where AttributeGraph is loaded. Think of it like checking if a house number is on Main Street without needing to look up the resident's name.
How It Works
1. Find AttributeGraph in Memory
macOS loads frameworks (like AttributeGraph) into specific memory regions. We scan all loaded frameworks using
dyld(the dynamic linker):2. Parse Mach-O Segments
Each framework is a Mach-O binary (macOS's executable format). These binaries are divided into segments - contiguous blocks of memory for different purposes:
__TEXT- Executable code__DATA- Mutable data__LINKEDIT- Dynamic linking infoWe parse these segments from the Mach-O header to get their exact address ranges:
3. Check Return Addresses
Now we can check if any address in our call stack falls within AttributeGraph's memory:
Critical Fix: Individual Segments vs. Giant Range
An early version of this PR used
min(all segments)tomax(all segments), creating one giant range. This caused test failures because segments aren't always contiguous:Using one range (0x1c00c1000 - 0x2297c4000) would incorrectly match addresses in the 1.6GB gap! The fix is checking each segment individually.
Performance
Before:
After:
The new approach avoids all symbol table operations, providing significant speedup while maintaining accuracy.
Technical Notes
Memory Safety
The implementation uses
UnsafePointerto read Mach-O headers, which is safe because:dyld-provided addresses (guaranteed valid)ASLR (Address Space Layout Randomization)
For security, macOS randomizes where frameworks load in memory. The "slide" is this random offset:
We apply this slide to get actual loaded addresses.
Caching Strategy
Results are cached using
Thread.callStackReturnAddresses.hashValueas a key. This means:Testing
All existing tests pass. The individual segment checking ensures we correctly distinguish between:
References
This optimization was inspired by discussion of the same performance issue in SpamSieve:
https://mjtsai.com/blog/2025/10/03/spamsieve-3-2-1/