Improve test suite reliability with randomization, TSan, and flaky test fixes by Valpertui · Pull Request #2823 · DataDog/dd-sdk-ios

Valpertui · 2026-04-07T16:47:34Z

What and why?

The iOS SDK test suite has been experiencing intermittent CI failures (~50% pass rate locally with make test-ios-all). This PR addresses the root causes of flakiness and adds infrastructure to detect order-dependent and thread-unsafe tests earlier.

How?

1. Enable test randomization across all 20 schemes
Adds randomExecutionOrdering = "YES" to every test scheme to surface order-dependent test failures that were hidden by deterministic execution.

2. Extend Thread Sanitizer coverage
Enables TSan on DatadogInternal, DatadogRUM, DatadogLogs, and DatadogTrace (iOS + tvOS) where it was previously missing. TSan was already enabled on DatadogCore, DatadogCrashReporting, and IntegrationTests.

3. Fix timing-sensitive tests
Tests with timeouts too tight for CI environments:

AppHangsWatchdogThreadTests: threshold 0.1s→0.5s, wait multiplier 10x→15x, duration tolerance uses Constants.tolerance + ciPadding
Profiling concurrency tests (CTorProfiler, MachSamplingProfiler, SafeRead, AppLaunchProfiler): timeout: 0.1→2.0 for concurrentPerform waits
AppStateManagerTests: 0.1→2.0 for async data store operations
DisplayLinkerTests: wait(during:) 0.1s→0.25s for CADisplayLink callbacks
VitalInfoSamplerTests: XCTAssertEqual(sampleCount, 2) → XCTAssertGreaterThanOrEqual (timer scheduling can produce extra samples)
WatchdogTerminationsMonitoringTests: added 10s deadline to unbounded polling loop
ViewHitchesIntegrationTests: increased wait for frame hitch generation

4. Fix data races in URLSessionTaskStateSwizzlerTests
interceptedStates (plain Array) and interceptionCount (plain Int) were mutated from concurrent URLSession delegate callbacks — a genuine data race. Wrapped in ThreadSafeStates and ThreadSafeCounter. Also replaced Thread.sleep(1) with expectation-based waiting.

5. Fix flaky KSCrashBacktraceTests
testGenerateBacktraceForBackgroundThread was asserting that DatadogCrashReportingTests and Foundation appear in binary images, but the background thread was blocked on semaphore.wait() (only system frames on stack). Fixed by busy-spinning the thread inside a @inline(never) function in the test module, so user code frames are on the stack at capture time. Restores the user image assertion.

Local performance impact (10-run average):

Category	Before	After	Delta
Full suite (11 modules)	224.5s	241.1s	+16.6s (+7.4%)
TSan-new modules (4)	53.9s	69.9s	+16.0s (+29.7%)
Other modules (7)	170.6s	171.2s	+0.6s (+0.4%)

The +16s overhead comes entirely from TSan instrumentation on the 4 newly-enabled modules. Timeout changes have near-zero wall-clock impact (they're ceilings, not delays).

Review checklist

Feature or bugfix MUST have appropriate tests (unit, integration)
Make sure each commit and the PR mention the Issue number or JIRA reference
~~Add CHANGELOG entry for user facing changes~~ N/A — internal test infrastructure only
~~Add Objective-C interface for public APIs~~ N/A — no public API changes
~~Run make api-surface~~ N/A — no API changes

Enable randomExecutionOrdering on all 20 test schemes to surface order-dependent test failures. Add Thread Sanitizer to DatadogInternal, DatadogRUM, DatadogLogs, and DatadogTrace (iOS + tvOS) where it was previously missing.

Increase timeouts for tests that use real threading, timers, or concurrent dispatch where 0.1s ceilings are too tight for CI: - AppHangsWatchdogThreadTests: raise threshold from 0.1s to 0.5s, widen wait multiplier from 10x to 15x, use Constants.tolerance + CI padding for duration assertions - Profiling concurrency tests: 0.1s to 2.0s for concurrentPerform waits - AppStateManagerTests: 0.1s to 2.0s for async data store operations - DisplayLinkerTests: wait(during:) from 0.1s to 0.25s for CADisplayLink - VitalInfoSamplerTests: use GreaterThanOrEqual for sample count - AppHangsMonitoringTests: raise threshold and hang duration - WatchdogTerminationsMonitoringTests: add 10s deadline to polling loop - ViewHitchesTests: increase wait for frame hitch generation

@inline

…acktraceTests URLSessionTaskStateSwizzlerTests: wrap interceptedStates and interceptionCount in thread-safe types (ThreadSafeStates, ThreadSafeCounter) to fix data races from concurrent URLSession callbacks. Replace Thread.sleep(1) with expectation-based waiting. KSCrashBacktraceTests: fix testGenerateBacktraceForBackgroundThread by busy-spinning the background thread inside a @inline(never) user code function so the backtrace captures user binary image frames. Restores the assertion that DatadogCrashReportingTests appears in binary images.

Each test called span.setActive() which enters an os_activity scope, but never called span.finish() which leaves it. With randomized test execution, accumulated nested os_activity scopes corrupted the activity hierarchy, causing getActiveSpan() to return nil in subsequent tests.

Valpertui added 3 commits April 7, 2026 18:48

Valpertui force-pushed the valpertui/fix/tests-suite-improvements branch from 22814b2 to efb6319 Compare April 7, 2026 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve test suite reliability with randomization, TSan, and flaky test fixes#2823

Improve test suite reliability with randomization, TSan, and flaky test fixes#2823
Valpertui wants to merge 4 commits intodevelopfrom
valpertui/fix/tests-suite-improvements

Valpertui commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Valpertui commented Apr 7, 2026

What and why?

How?

Review checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant