Skip to content

Improve test suite reliability with randomization, TSan, and flaky test fixes#2823

Draft
Valpertui wants to merge 4 commits intodevelopfrom
valpertui/fix/tests-suite-improvements
Draft

Improve test suite reliability with randomization, TSan, and flaky test fixes#2823
Valpertui wants to merge 4 commits intodevelopfrom
valpertui/fix/tests-suite-improvements

Conversation

@Valpertui
Copy link
Copy Markdown
Member

What and why?

The iOS SDK test suite has been experiencing intermittent CI failures (~50% pass rate locally with make test-ios-all). This PR addresses the root causes of flakiness and adds infrastructure to detect order-dependent and thread-unsafe tests earlier.

How?

1. Enable test randomization across all 20 schemes
Adds randomExecutionOrdering = "YES" to every test scheme to surface order-dependent test failures that were hidden by deterministic execution.

2. Extend Thread Sanitizer coverage
Enables TSan on DatadogInternal, DatadogRUM, DatadogLogs, and DatadogTrace (iOS + tvOS) where it was previously missing. TSan was already enabled on DatadogCore, DatadogCrashReporting, and IntegrationTests.

3. Fix timing-sensitive tests
Tests with timeouts too tight for CI environments:

  • AppHangsWatchdogThreadTests: threshold 0.1s→0.5s, wait multiplier 10x→15x, duration tolerance uses Constants.tolerance + ciPadding
  • Profiling concurrency tests (CTorProfiler, MachSamplingProfiler, SafeRead, AppLaunchProfiler): timeout: 0.12.0 for concurrentPerform waits
  • AppStateManagerTests: 0.12.0 for async data store operations
  • DisplayLinkerTests: wait(during:) 0.1s→0.25s for CADisplayLink callbacks
  • VitalInfoSamplerTests: XCTAssertEqual(sampleCount, 2)XCTAssertGreaterThanOrEqual (timer scheduling can produce extra samples)
  • WatchdogTerminationsMonitoringTests: added 10s deadline to unbounded polling loop
  • ViewHitchesIntegrationTests: increased wait for frame hitch generation

4. Fix data races in URLSessionTaskStateSwizzlerTests
interceptedStates (plain Array) and interceptionCount (plain Int) were mutated from concurrent URLSession delegate callbacks — a genuine data race. Wrapped in ThreadSafeStates and ThreadSafeCounter. Also replaced Thread.sleep(1) with expectation-based waiting.

5. Fix flaky KSCrashBacktraceTests
testGenerateBacktraceForBackgroundThread was asserting that DatadogCrashReportingTests and Foundation appear in binary images, but the background thread was blocked on semaphore.wait() (only system frames on stack). Fixed by busy-spinning the thread inside a @inline(never) function in the test module, so user code frames are on the stack at capture time. Restores the user image assertion.

Local performance impact (10-run average):

Category Before After Delta
Full suite (11 modules) 224.5s 241.1s +16.6s (+7.4%)
TSan-new modules (4) 53.9s 69.9s +16.0s (+29.7%)
Other modules (7) 170.6s 171.2s +0.6s (+0.4%)

The +16s overhead comes entirely from TSan instrumentation on the 4 newly-enabled modules. Timeout changes have near-zero wall-clock impact (they're ceilings, not delays).

Review checklist

  • Feature or bugfix MUST have appropriate tests (unit, integration)
  • Make sure each commit and the PR mention the Issue number or JIRA reference
  • Add CHANGELOG entry for user facing changes N/A — internal test infrastructure only
  • Add Objective-C interface for public APIs N/A — no public API changes
  • Run make api-surface N/A — no API changes

Enable randomExecutionOrdering on all 20 test schemes to surface
order-dependent test failures. Add Thread Sanitizer to DatadogInternal,
DatadogRUM, DatadogLogs, and DatadogTrace (iOS + tvOS) where it was
previously missing.
Increase timeouts for tests that use real threading, timers, or
concurrent dispatch where 0.1s ceilings are too tight for CI:

- AppHangsWatchdogThreadTests: raise threshold from 0.1s to 0.5s,
  widen wait multiplier from 10x to 15x, use Constants.tolerance + CI
  padding for duration assertions
- Profiling concurrency tests: 0.1s to 2.0s for concurrentPerform waits
- AppStateManagerTests: 0.1s to 2.0s for async data store operations
- DisplayLinkerTests: wait(during:) from 0.1s to 0.25s for CADisplayLink
- VitalInfoSamplerTests: use GreaterThanOrEqual for sample count
- AppHangsMonitoringTests: raise threshold and hang duration
- WatchdogTerminationsMonitoringTests: add 10s deadline to polling loop
- ViewHitchesTests: increase wait for frame hitch generation
…acktraceTests

URLSessionTaskStateSwizzlerTests: wrap interceptedStates and
interceptionCount in thread-safe types (ThreadSafeStates,
ThreadSafeCounter) to fix data races from concurrent URLSession
callbacks. Replace Thread.sleep(1) with expectation-based waiting.

KSCrashBacktraceTests: fix testGenerateBacktraceForBackgroundThread
by busy-spinning the background thread inside a @inline(never) user
code function so the backtrace captures user binary image frames.
Restores the assertion that DatadogCrashReportingTests appears in
binary images.
@Valpertui Valpertui force-pushed the valpertui/fix/tests-suite-improvements branch from 22814b2 to efb6319 Compare April 7, 2026 16:49
Each test called span.setActive() which enters an os_activity scope,
but never called span.finish() which leaves it. With randomized test
execution, accumulated nested os_activity scopes corrupted the activity
hierarchy, causing getActiveSpan() to return nil in subsequent tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant