Skip to content

Optimize isSwiftUI() by replacing symbol resolution with segment address checking#165

Merged
stephencelis merged 5 commits intopointfreeco:mainfrom
robmaceachern:robmaceachern/isswiftui-improvements
Oct 14, 2025
Merged

Optimize isSwiftUI() by replacing symbol resolution with segment address checking#165
stephencelis merged 5 commits intopointfreeco:mainfrom
robmaceachern:robmaceachern/isswiftui-improvements

Conversation

@robmaceachern
Copy link
Copy Markdown
Contributor

Optimize isSwiftUI() performance by avoiding expensive symbol resolution

Warning

This is just for discussion at this point. This was implemented with an AI agent and I haven't fully digested it, but it does seem to avoid the severe performance issues we were seeing and all existing iOS tests pass. macOS tests fail but they also fail on main today.

AI-generated summary in details.

Details

Summary

Profiling revealed that isSwiftUI() was a performance bottleneck due to expensive symbol resolution calls. This PR replaces the symbol-based approach with direct memory address checking against Mach-O segment ranges, providing significant performance improvements while maintaining correctness.

Problem

The isSwiftUI() method detects whether code is being called from SwiftUI's AttributeGraph rendering pipeline (to warn developers about untracked state access). The original implementation worked like this:

return Thread.callStackSymbols.reversed().contains { 
  $0.contains("AttributeGraph ")
}

Why This Was Slow

When you call Thread.callStackSymbols, Swift needs to convert memory addresses into human-readable function names. This triggers a chain of expensive operations:

  1. backtrace_symbols() - System call to symbolicate addresses
  2. dladdr() - Looks up which library owns each address
  3. findClosestSymbol() - Scans symbol tables to find function names

Profiling showed that findClosestSymbol() was the bottleneck, as it needs to search through potentially thousands of symbols for every stack frame. This happened on every property access in debug builds, making the app noticeably slower.

Solution

Instead of converting addresses to symbol names, we can check addresses directly against the memory ranges where AttributeGraph is loaded. Think of it like checking if a house number is on Main Street without needing to look up the resident's name.

How It Works

1. Find AttributeGraph in Memory

macOS loads frameworks (like AttributeGraph) into specific memory regions. We scan all loaded frameworks using dyld (the dynamic linker):

let imageCount = _dyld_image_count()  // How many frameworks are loaded?
for i in 0..<imageCount {
  let name = _dyld_get_image_name(i)  // Get framework path
  if name.contains("AttributeGraph") {
    // Found it! Now get its memory address...
  }
}

2. Parse Mach-O Segments

Each framework is a Mach-O binary (macOS's executable format). These binaries are divided into segments - contiguous blocks of memory for different purposes:

  • __TEXT - Executable code
  • __DATA - Mutable data
  • __LINKEDIT - Dynamic linking info
  • etc.

We parse these segments from the Mach-O header to get their exact address ranges:

// Start at the Mach-O header
let header = _dyld_get_image_header(i)

// Walk through "load commands" that describe the binary
for each command in header.loadCommands {
  if command.type == LC_SEGMENT_64 {  // It's a segment!
    let segment = command.data
    
    // Calculate where this segment is actually loaded:
    // vmaddr = relative address in the binary file
    // slide = ASLR offset (security randomization)
    let actualStart = segment.vmaddr + slide
    let actualEnd = actualStart + segment.size
    
    ranges.append((actualStart, actualEnd))
  }
}

3. Check Return Addresses

Now we can check if any address in our call stack falls within AttributeGraph's memory:

let addresses = Thread.callStackReturnAddresses  // Just raw pointers
return addresses.contains { address in
  attributeGraphRanges.contains { start, end in
    address >= start && address < end  // Simple integer comparison!
  }
}

Critical Fix: Individual Segments vs. Giant Range

An early version of this PR used min(all segments) to max(all segments), creating one giant range. This caused test failures because segments aren't always contiguous:

Segment 1: 0x1c00c1000 - 0x1c00c5000 (TEXT)
[gap with other frameworks]
Segment 2: 0x2297c0000 - 0x2297c4000 (LINKEDIT)

Using one range (0x1c00c1000 - 0x2297c4000) would incorrectly match addresses in the 1.6GB gap! The fix is checking each segment individually.

Performance

Before:

  • Symbol resolution for every stack frame
  • Thousands of symbol table lookups
  • User-visible slowness in debug builds

After:

  • Scan ~300-500 framework names (fast string checks)
  • Parse ~5-10 segments for AttributeGraph (simple memory reads)
  • Check ~20-40 addresses (integer comparisons)
  • Results cached by call stack hash (subsequent calls are instant)

The new approach avoids all symbol table operations, providing significant speedup while maintaining accuracy.

Technical Notes

Memory Safety

The implementation uses UnsafePointer to read Mach-O headers, which is safe because:

  • We're reading from dyld-provided addresses (guaranteed valid)
  • We only read, never write
  • We check bounds (number of commands, segment sizes)

ASLR (Address Space Layout Randomization)

For security, macOS randomizes where frameworks load in memory. The "slide" is this random offset:

Binary says: TEXT starts at 0x100000000
ASLR slide: +0x20000000
Actual location: 0x120000000

We apply this slide to get actual loaded addresses.

Caching Strategy

Results are cached using Thread.callStackReturnAddresses.hashValue as a key. This means:

  • ✅ Same call path = instant cache hit
  • ✅ Different call path = re-check (still fast)
  • ✅ No stale data (cache key uniquely identifies the call stack)

Testing

All existing tests pass. The individual segment checking ensures we correctly distinguish between:

  • ✅ Code running in AttributeGraph's rendering pipeline (should warn)
  • ✅ Code in Task closures, button actions, etc. (should not warn)

References

This optimization was inspired by discussion of the same performance issue in SpamSieve:
https://mjtsai.com/blog/2025/10/03/spamsieve-3-2-1/

…ess checking

Replace Thread.callStackSymbols (which calls expensive findClosestSymbol) with direct Mach-O segment parsing and address range checking. This avoids symbol resolution entirely while maintaining correctness.

Ref: https://mjtsai.com/blog/2025/10/03/spamsieve-3-2-1/
Comment on lines +316 to +317
let start = UInt(bitPattern: Int(segment.vmaddr) + slide)
let end = UInt(bitPattern: Int(segment.vmaddr + segment.vmsize) + slide)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to convert to UInt before arithmetic to avoid overflow? (Ditto below)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bitPattern init takes an Int, so I think this is OK.

Comment on lines +359 to +361
var attributeGraphRanges: [(UInt, UInt)] = []
let imageCount = _dyld_image_count()
for i in 0..<imageCount {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should attributeGraphRanges be cached for reuse across multiple locations?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a quick refactor to cache things.

@stephencelis
Copy link
Copy Markdown
Member

@robmaceachern Did a cleanup/optimization pass. Want to take things for another spin to make sure I didn't mess anything up?

}
}

private let attributeGraphAddresses: RangeSet = {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some thoughts on the caching:

  1. is it conceivable that this could ever return an empty range but SwiftUI/AG will show up at some future point? like is there a circumstance where we get to this function's executing but somehow the library we're looking for has not yet been loaded? i imagine it's unlikely if at all possible...
  2. if that is possible, perhaps we could force it to load by explicitly looking up a symbol via dladdr() or something?
  3. can the address range data be cached globally, rather than per-instance?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 1 is unlikely but I'm not sure it can be totally ruled out. I don't know if we can safely force load it since afaik none of the symbols are public.

can the address range data be cached globally, rather than per-instance?

I think that's probably a good idea.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the address range data be cached globally, rather than per-instance?

Actually, I think it already is! The indentation kind of threw me off for a second but it's not nested inside PerceptionRegistrar.

let imageCount = _dyld_image_count()
for i in 0..<imageCount {
guard let imageName = _dyld_get_image_name(i) else { continue }
if String(cString: imageName).hasSuffix("/AttributeGraph") {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know it wasn't changed here, but why look for AG specifically vs something in SwiftUI/SwiftUICore?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just guessing: probably an issue with additional false-positives in the non-View bits of SwiftUI.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This debug code is specifically for letting folks know when a property is accessed from a SwiftUI view that was not wrapped in the WithPerceptionTracking observable view.

Copy link
Copy Markdown

@jamieQ jamieQ Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: replied to the wrong thread... moving the comment


private let attributeGraphAddresses: RangeSet = {
var addresses = RangeSet()
let imageCount = _dyld_image_count()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see this in the header for many of these methods:

/*
 * The following functions allow you to iterate through all loaded images.  
 * This is not a thread safe operation.  Another thread can add or remove
 * an image during the iteration.  
 *
 * Many uses of these routines can be replace by a call to dladdr() which 
 * will return the mach_header and name of an image, given an address in 
 * the image. dladdr() is thread safe.
 */

not sure if the slide value is accessible via other means though...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess it's a risk. Maybe the AttributeGraph image could get pushed out of the imageCount range and our checks would be wrong (and cached wrong forever). Seems like low probability and low impact though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamieQ We're open to improvements here, but also this is debug-only code that can be disabled, so we're not sure there's too much of a risk here, especially since AttributeGraph is likely loaded early and forever in a SwiftUI application.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense. if we're not too concerned with the risks here this seems like this might be fine. it does feel like we should be able to make the runtime do most of this work for us in some manner (e.g. dlopen the private framework, dlsym a symbol name we expect to 'always' exist in AttributeGraph or something along those lines), but maybe this is good enough.

i think the bigger concern than a logical race would be if there is potential for causing data races or crashes via use of these API. e.g. the OSS distributions of dyld suggest there's some risk that reading from the underlying image vector via the _dyld_image_count() API could race on the underlying storage if it were to be resized concurrently, though TBH i'm not familiar enough with C++ to have a sense of how much of a concern this might be in practice. since it looks like the implementation defends against indexing into the loaded images vector with a bad value, that failure mode is presumably eliminated (which was originally my primary worry).


mostly out of curiosity, i messed around a bit with some of the other dyld APIs, and i think there are at least two alternatives that could work to sidestep the thread safety risks. the first is to use dlopen + dlsym + dladdr to get the mach header of the private library we want to derive the address ranges for. something like:

// get a handle to the library that we expect to exist. don't load it if it isn't yet (unlikely)
let expectedAGPath = "/System/Library/PrivateFrameworks/AttributeGraph.framework/AttributeGraph"
guard let handle = dlopen(expectedAGPath, RTLD_NOLOAD) else {
  return nil
}
defer { dlclose(handle) }

// look up a symbol we think should exist in the library
guard let symbol = dlsym(handle, "AGGraphCreate") else {
  return nil
}

// get address info for the symbol, which will give use the base address of the library
var symbolInfo = Dl_info()
guard dladdr(symbol, &symbolInfo) != 0 else { // 0 is failure
  return nil
}

guard let baseAddress = symbolInfo.dli_fbase else {
  return nil
}

// Parse Mach-O header to get the size
let header = baseAddress.assumingMemoryBound(to: mach_header_64.self)

// some similar logic as below to derive the address range from the header
// ...

the second would be to use the _dyld_register_func_for_add_image() function to immediately get callbacks with the mach headers and slide values for all currently loaded images. within the implementation of the callback function we'd presumably have to do something like a dladdr() of the mach header we're passed, and then perform logic similar to what we have here (skip the library names we don't care about, then derive and cache the address ranges we want to handle). i didn't test this one too much, but i think it would be structurally quite similar to the current approach, but with fewer underscored API calls. once the callback is registered i don't think it can be unregistered though, so there isn't really a way to 'exit early' once you set it up, and it will continue to get callbacks if future images are loaded during execution (so the implementation would have to deal with that).

one additional thought on the ASLR 'slide' values... anecdotally, when testing, the slide values for stuff from the 'dyld shared cache' seemed to be reported as zero (which is where i would expect all the system framework stuff to be on Darwin targets). i'm not sure exactly what that means or implies though TBH. if we need to figure out the slide value and can't just assume it will be zero in this case, the first approach might not work, if we do, the second one should provide the value in the callback function.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into things!

We'd be down for the first approach if you're open to PR it. My main question about the snippet above is if the code should eager/lazy-load AttributeGraph as a precaution, rather than avoiding it via RTLD_NOLOAD, since these addresses will be cached (unless you're suggesting always loading the addresses fresh each isSwiftUI check?).

We've decided to merge this PR as is since it's a huge improvement to developer QoL, but don't take that to mean we're not interested in more improvements down the line.

@stephencelis stephencelis marked this pull request as ready for review October 11, 2025 00:09
@stephencelis
Copy link
Copy Markdown
Member

I don't think there's any rush to merge/release, so I'll let things simmer over the weekend. If anyone wants to suggest any more improvements (@jamieQ?) please do!

@stephencelis stephencelis merged commit 4f47eba into pointfreeco:main Oct 14, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants