Optimize isSwiftUI() by replacing symbol resolution with segment address checking by robmaceachern · Pull Request #165 · pointfreeco/swift-perception

robmaceachern · 2025-10-10T18:04:33Z

Optimize isSwiftUI() performance by avoiding expensive symbol resolution

Warning

This is just for discussion at this point. This was implemented with an AI agent and I haven't fully digested it, but it does seem to avoid the severe performance issues we were seeing and all existing iOS tests pass. macOS tests fail but they also fail on main today.

AI-generated summary in details.

Details

Summary

Profiling revealed that isSwiftUI() was a performance bottleneck due to expensive symbol resolution calls. This PR replaces the symbol-based approach with direct memory address checking against Mach-O segment ranges, providing significant performance improvements while maintaining correctness.

Problem

The isSwiftUI() method detects whether code is being called from SwiftUI's AttributeGraph rendering pipeline (to warn developers about untracked state access). The original implementation worked like this:

return Thread.callStackSymbols.reversed().contains { 
  $0.contains("AttributeGraph ")
}

Why This Was Slow

When you call Thread.callStackSymbols, Swift needs to convert memory addresses into human-readable function names. This triggers a chain of expensive operations:

backtrace_symbols() - System call to symbolicate addresses
dladdr() - Looks up which library owns each address
findClosestSymbol() - Scans symbol tables to find function names

Profiling showed that findClosestSymbol() was the bottleneck, as it needs to search through potentially thousands of symbols for every stack frame. This happened on every property access in debug builds, making the app noticeably slower.

Solution

Instead of converting addresses to symbol names, we can check addresses directly against the memory ranges where AttributeGraph is loaded. Think of it like checking if a house number is on Main Street without needing to look up the resident's name.

How It Works

1. Find AttributeGraph in Memory

macOS loads frameworks (like AttributeGraph) into specific memory regions. We scan all loaded frameworks using dyld (the dynamic linker):

let imageCount = _dyld_image_count()  // How many frameworks are loaded?
for i in 0..<imageCount {
  let name = _dyld_get_image_name(i)  // Get framework path
  if name.contains("AttributeGraph") {
    // Found it! Now get its memory address...
  }
}

2. Parse Mach-O Segments

Each framework is a Mach-O binary (macOS's executable format). These binaries are divided into segments - contiguous blocks of memory for different purposes:

__TEXT - Executable code
__DATA - Mutable data
__LINKEDIT - Dynamic linking info
etc.

We parse these segments from the Mach-O header to get their exact address ranges:

// Start at the Mach-O header
let header = _dyld_get_image_header(i)

// Walk through "load commands" that describe the binary
for each command in header.loadCommands {
  if command.type == LC_SEGMENT_64 {  // It's a segment!
    let segment = command.data
    
    // Calculate where this segment is actually loaded:
    // vmaddr = relative address in the binary file
    // slide = ASLR offset (security randomization)
    let actualStart = segment.vmaddr + slide
    let actualEnd = actualStart + segment.size
    
    ranges.append((actualStart, actualEnd))
  }
}

3. Check Return Addresses

Now we can check if any address in our call stack falls within AttributeGraph's memory:

let addresses = Thread.callStackReturnAddresses  // Just raw pointers
return addresses.contains { address in
  attributeGraphRanges.contains { start, end in
    address >= start && address < end  // Simple integer comparison!
  }
}

Critical Fix: Individual Segments vs. Giant Range

An early version of this PR used min(all segments) to max(all segments), creating one giant range. This caused test failures because segments aren't always contiguous:

Segment 1: 0x1c00c1000 - 0x1c00c5000 (TEXT)
[gap with other frameworks]
Segment 2: 0x2297c0000 - 0x2297c4000 (LINKEDIT)

Using one range (0x1c00c1000 - 0x2297c4000) would incorrectly match addresses in the 1.6GB gap! The fix is checking each segment individually.

Performance

Before:

Symbol resolution for every stack frame
Thousands of symbol table lookups
User-visible slowness in debug builds

After:

Scan ~300-500 framework names (fast string checks)
Parse ~5-10 segments for AttributeGraph (simple memory reads)
Check ~20-40 addresses (integer comparisons)
Results cached by call stack hash (subsequent calls are instant)

The new approach avoids all symbol table operations, providing significant speedup while maintaining accuracy.

Technical Notes

Memory Safety

The implementation uses UnsafePointer to read Mach-O headers, which is safe because:

We're reading from dyld-provided addresses (guaranteed valid)
We only read, never write
We check bounds (number of commands, segment sizes)

ASLR (Address Space Layout Randomization)

For security, macOS randomizes where frameworks load in memory. The "slide" is this random offset:

Binary says: TEXT starts at 0x100000000
ASLR slide: +0x20000000
Actual location: 0x120000000

We apply this slide to get actual loaded addresses.

Caching Strategy

Results are cached using Thread.callStackReturnAddresses.hashValue as a key. This means:

✅ Same call path = instant cache hit
✅ Different call path = re-check (still fast)
✅ No stale data (cache key uniquely identifies the call stack)

Testing

All existing tests pass. The individual segment checking ensures we correctly distinguish between:

✅ Code running in AttributeGraph's rendering pipeline (should warn)
✅ Code in Task closures, button actions, etc. (should not warn)

References

This optimization was inspired by discussion of the same performance issue in SpamSieve:
https://mjtsai.com/blog/2025/10/03/spamsieve-3-2-1/

…ess checking Replace Thread.callStackSymbols (which calls expensive findClosestSymbol) with direct Mach-O segment parsing and address range checking. This avoids symbol resolution entirely while maintaining correctness. Ref: https://mjtsai.com/blog/2025/10/03/spamsieve-3-2-1/

square-tomb · 2025-10-10T19:02:15Z

Sources/PerceptionCore/Perception/PerceptionRegistrar.swift

+              let start = UInt(bitPattern: Int(segment.vmaddr) + slide)
+              let end = UInt(bitPattern: Int(segment.vmaddr + segment.vmsize) + slide)


Do we need to convert to UInt before arithmetic to avoid overflow? (Ditto below)

The bitPattern init takes an Int, so I think this is OK.

square-tomb · 2025-10-10T19:04:51Z

Sources/PerceptionCore/Perception/PerceptionRegistrar.swift

+        var attributeGraphRanges: [(UInt, UInt)] = []
+        let imageCount = _dyld_image_count()
+        for i in 0..<imageCount {


Should attributeGraphRanges be cached for reuse across multiple locations?

I pushed a quick refactor to cache things.

stephencelis · 2025-10-10T19:34:51Z

@robmaceachern Did a cleanup/optimization pass. Want to take things for another spin to make sure I didn't mess anything up?

Sources/PerceptionCore/Perception/PerceptionRegistrar.swift

jamieQ · 2025-10-10T20:18:04Z

Sources/PerceptionCore/Perception/PerceptionRegistrar.swift

    }
  }
+
+  private let attributeGraphAddresses: RangeSet = {


some thoughts on the caching:

is it conceivable that this could ever return an empty range but SwiftUI/AG will show up at some future point? like is there a circumstance where we get to this function's executing but somehow the library we're looking for has not yet been loaded? i imagine it's unlikely if at all possible...

if that is possible, perhaps we could force it to load by explicitly looking up a symbol via dladdr() or something?

can the address range data be cached globally, rather than per-instance?

I think 1 is unlikely but I'm not sure it can be totally ruled out. I don't know if we can safely force load it since afaik none of the symbols are public.

can the address range data be cached globally, rather than per-instance?

I think that's probably a good idea.

can the address range data be cached globally, rather than per-instance?

Actually, I think it already is! The indentation kind of threw me off for a second but it's not nested inside PerceptionRegistrar.

jamieQ · 2025-10-10T20:28:25Z

Sources/PerceptionCore/Perception/PerceptionRegistrar.swift

+    let imageCount = _dyld_image_count()
+    for i in 0..<imageCount {
+      guard let imageName = _dyld_get_image_name(i) else { continue }
+      if String(cString: imageName).hasSuffix("/AttributeGraph") {


i know it wasn't changed here, but why look for AG specifically vs something in SwiftUI/SwiftUICore?

Just guessing: probably an issue with additional false-positives in the non-View bits of SwiftUI.

This debug code is specifically for letting folks know when a property is accessed from a SwiftUI view that was not wrapped in the WithPerceptionTracking observable view.

edit: replied to the wrong thread... moving the comment

jamieQ · 2025-10-10T21:06:06Z

Sources/PerceptionCore/Perception/PerceptionRegistrar.swift

+
+  private let attributeGraphAddresses: RangeSet = {
+    var addresses = RangeSet()
+    let imageCount = _dyld_image_count()


i see this in the header for many of these methods:

/* * The following functions allow you to iterate through all loaded images. * This is not a thread safe operation. Another thread can add or remove * an image during the iteration. * * Many uses of these routines can be replace by a call to dladdr() which * will return the mach_header and name of an image, given an address in * the image. dladdr() is thread safe. */

not sure if the slide value is accessible via other means though...

Yeah I guess it's a risk. Maybe the AttributeGraph image could get pushed out of the imageCount range and our checks would be wrong (and cached wrong forever). Seems like low probability and low impact though.

@jamieQ We're open to improvements here, but also this is debug-only code that can be disabled, so we're not sure there's too much of a risk here, especially since AttributeGraph is likely loaded early and forever in a SwiftUI application.

that makes sense. if we're not too concerned with the risks here this seems like this might be fine. it does feel like we should be able to make the runtime do most of this work for us in some manner (e.g. dlopen the private framework, dlsym a symbol name we expect to 'always' exist in AttributeGraph or something along those lines), but maybe this is good enough.

i think the bigger concern than a logical race would be if there is potential for causing data races or crashes via use of these API. e.g. the OSS distributions of dyld suggest there's some risk that reading from the underlying image vector via the _dyld_image_count() API could race on the underlying storage if it were to be resized concurrently, though TBH i'm not familiar enough with C++ to have a sense of how much of a concern this might be in practice. since it looks like the implementation defends against indexing into the loaded images vector with a bad value, that failure mode is presumably eliminated (which was originally my primary worry).

mostly out of curiosity, i messed around a bit with some of the other dyld APIs, and i think there are at least two alternatives that could work to sidestep the thread safety risks. the first is to use dlopen + dlsym + dladdr to get the mach header of the private library we want to derive the address ranges for. something like:

// get a handle to the library that we expect to exist. don't load it if it isn't yet (unlikely) let expectedAGPath = "/System/Library/PrivateFrameworks/AttributeGraph.framework/AttributeGraph" guard let handle = dlopen(expectedAGPath, RTLD_NOLOAD) else { return nil } defer { dlclose(handle) } // look up a symbol we think should exist in the library guard let symbol = dlsym(handle, "AGGraphCreate") else { return nil } // get address info for the symbol, which will give use the base address of the library var symbolInfo = Dl_info() guard dladdr(symbol, &symbolInfo) != 0 else { // 0 is failure return nil } guard let baseAddress = symbolInfo.dli_fbase else { return nil } // Parse Mach-O header to get the size let header = baseAddress.assumingMemoryBound(to: mach_header_64.self) // some similar logic as below to derive the address range from the header // ...

the second would be to use the _dyld_register_func_for_add_image() function to immediately get callbacks with the mach headers and slide values for all currently loaded images. within the implementation of the callback function we'd presumably have to do something like a dladdr() of the mach header we're passed, and then perform logic similar to what we have here (skip the library names we don't care about, then derive and cache the address ranges we want to handle). i didn't test this one too much, but i think it would be structurally quite similar to the current approach, but with fewer underscored API calls. once the callback is registered i don't think it can be unregistered though, so there isn't really a way to 'exit early' once you set it up, and it will continue to get callbacks if future images are loaded during execution (so the implementation would have to deal with that).

one additional thought on the ASLR 'slide' values... anecdotally, when testing, the slide values for stuff from the 'dyld shared cache' seemed to be reported as zero (which is where i would expect all the system framework stuff to be on Darwin targets). i'm not sure exactly what that means or implies though TBH. if we need to figure out the slide value and can't just assume it will be zero in this case, the first approach might not work, if we do, the second one should provide the value in the callback function.

Thanks for looking into things!

We'd be down for the first approach if you're open to PR it. My main question about the snippet above is if the code should eager/lazy-load AttributeGraph as a precaution, rather than avoiding it via RTLD_NOLOAD, since these addresses will be cached (unless you're suggesting always loading the addresses fresh each isSwiftUI check?).

We've decided to merge this PR as is since it's a huge improvement to developer QoL, but don't take that to mean we're not interested in more improvements down the line.

stephencelis · 2025-10-11T00:10:57Z

I don't think there's any rush to merge/release, so I'll let things simmer over the weekend. If anyone wants to suggest any more improvements (@jamieQ?) please do!

square-tomb reviewed Oct 10, 2025

View reviewed changes

wip

97f6d6b

wip

a72dfc6

jamieQ reviewed Oct 10, 2025

View reviewed changes

stephencelis added 2 commits October 10, 2025 17:04

wip

9464a07

wip

5987634

stephencelis marked this pull request as ready for review October 11, 2025 00:09

stephencelis merged commit 4f47eba into pointfreeco:main Oct 14, 2025
3 checks passed

		let start = UInt(bitPattern: Int(segment.vmaddr) + slide)
		let end = UInt(bitPattern: Int(segment.vmaddr + segment.vmsize) + slide)

Conversation

robmaceachern commented Oct 10, 2025

Summary

Problem

Why This Was Slow

Solution

How It Works

Critical Fix: Individual Segments vs. Giant Range

Performance

Technical Notes

Memory Safety

ASLR (Address Space Layout Randomization)

Caching Strategy

Testing

References

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephencelis commented Oct 10, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamieQ Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephencelis commented Oct 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jamieQ Oct 13, 2025 •

edited

Loading