Add parallel gtest flake reproduction runner#14827
Conversation
Add tools/gtest_parallel_repro.py for reproducing flakes that depend on process-level CPU contention and scheduler delays. The runner launches fresh gtest processes in parallel, isolates TEST_TMPDIR per process, optionally pins them to a small CPU set, summarizes failures, and can rebuild with COERCE_CONTEXT_SWITCH=1. Document the runner in CLAUDE.md so agents use it when single-process --gtest_repeat or COERCE_CONTEXT_SWITCH alone does not reproduce a CI-style flaky test.
|
| Check | Count |
|---|---|
bugprone-argument-comment |
2 |
| Total | 2 |
Details
file/random_access_file_reader_test.cc (1 warning(s))
file/random_access_file_reader_test.cc:114:21: warning: argument name 'direct_io_buffer' in comment does not match parameter name 'direct_io_buffer_context' [bugprone-argument-comment]
table/block_fetcher.cc (1 warning(s))
table/block_fetcher.cc:359:23: warning: argument name 'direct_io_buffer' in comment does not match parameter name 'direct_io_buffer_context' [bugprone-argument-comment]
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D107758097. |
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 246fe5b ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 246fe5b SummaryA well-structured standalone Python tool for reproducing CI-specific gtest flakes. The core design is sound, but there are process cleanup gaps on interruption and a misleading runtime env var that is actually compile-time only. High-severity findings (2):
Full review (click to expand)Findings🔴 HIGHH1. No signal handler -- orphaned child processes on Ctrl+C —
|
| Context | Relevant? | Notes |
|---|---|---|
| COERCE_CONTEXT_SWITCH | YES | Compile-time only; runtime env var is no-op (M1) |
| macOS | YES | taskset unavailable; clear error produced (L1) |
| Resource exhaustion | LOW RISK | 100 concurrent processes could hit FD limits on constrained systems, but this is a developer tool, not production |
| Disk space | LOW RISK | Many log directories created; acceptable for a debugging tool |
Positive Observations
- Clean separation of concerns (build, run, collect, report)
- Good use of
start_new_session=Truefor process group isolation - Thoughtful
TEST_TMPDIRisolation per process - JSONL output format is well-designed for downstream tooling
- The
--stop-on-failureflag is a useful optimization - Failure key extraction (gtest failures, sanitizer errors, assertions) is comprehensive
- The
build_if_requestedwarning when--coerce-context-switchis used without--buildis a nice touch - CLAUDE.md documentation is clear and well-placed in the existing flakiness section
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
Running tools/gtest_parallel_repro.py with multiple fresh processes exposed two independent test bugs. First, c_test built most filesystem paths from geteuid(), and remote_compaction_null_callback_handling used the fixed rocksdb_c_test_null_service path. Parallel processes from the same user could therefore share directories and fail opening LOCK with Resource temporarily unavailable. Add a process-unique test id on POSIX and use it for the C API test paths, including the null-service DB path. Second, EnvPosixTest.ReadAsyncQueueFull simulated a full io_uring submission queue by overwriting the SQE pointer after io_uring_get_sqe() had already consumed a submission slot. That left stale state in the thread-local ring, so later AbortIO tests could process an unexpected completion and crash. Add a skip_io_uring_get_sqe syncpoint before the io_uring_get_sqe() call and use it from the test so the Busy path is exercised without mutating the ring. Validation: rebuilt c_test and env_test with COERCE_CONTEXT_SWITCH=1; reran c_test for 20 parallel processes, the queue-full/AbortIO env sequence for 20 parallel processes, full env_test for 8 parallel processes, and the full 219-binary repro sweep. Slow timeout artifacts passed with longer timeout or gtest sharding. Also ran format-auto, check-sources, and git diff --check.
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 9bd41f0 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 9bd41f0 SummaryThis PR adds a parallel gtest flake reproduction runner, fixes an io_uring ring corruption bug in the High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1. Python script lacks unit tests —
|
| Context | Code executes? | Assumptions hold? | Action needed? |
|---|---|---|---|
| Release builds | YES (sync points are no-ops) | YES | None |
| Non-io_uring platforms | NO (early return) | N/A | None |
| WritePreparedTxnDB | NO (sync points unrelated) | N/A | None |
| Multi-threaded test runners | YES | YES (thread-local ring) | None |
Positive Observations
- The boolean skip sync point pattern matches established precedent in
block_based_table_builder.ccandplain_table_builder.cc. TEST_SYNC_POINT_CALLBACKcompiles to nothing in release builds -- verified zero overhead.- The
c_testisolation fix using(euid << 32) ^ pidis simple and effective for parallel execution. - Process management in the Python script is robust: uses
start_new_session=Truefor process group isolation,os.killpg()for group termination, and proper SIGTERM-then-SIGKILL escalation with timeout handling. - The Python script includes useful features: failure key extraction via regex, JSONL output, histogram summarization, and
--stop-on-failurefor interactive use.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D107758097. |
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D107758097. |
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 35c9fcb ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 35c9fcb SummaryThis PR correctly fixes an io_uring ring corruption bug in the High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🔴 HIGHNo high-severity findings. 🟡 MEDIUMM1. Missing copyright/license header in
|
| Context | Does code execute? | Assumptions hold? | Action needed? |
|---|---|---|---|
| WritePreparedTxnDB | YES (via table readers) | YES (sync points are no-op in release) | None |
| ReadOnly DB | YES (heavy table reading) | YES | None |
| CompactionService | YES (reads SST files) | YES | None |
| User-defined timestamps | No direct impact | N/A | None |
| MemPurge | No async I/O | N/A | None |
| BlobDB | YES (via table readers) | YES | None |
Ring corruption mechanism verified:
io_uring_get_sqe()advances the internal SQ tail pointer- Nulling the returned sqe pointer does NOT undo the tail advancement
- Subsequent
io_uring_submit()processes the stale/uninitialized SQE slot - The thread-local ring is shared across all tests in the
TestAsyncReadfixture - The new
skip_io_uring_get_sqeapproach prevents this by skipping the call entirely
Production impact: NONE. TEST_SYNC_POINT_CALLBACK is #defined to nothing in release builds (#ifdef NDEBUG). The compiler eliminates the bool skip_get_sqe variable and the if (!skip_get_sqe) branch entirely via dead code elimination and constant propagation.
Positive Observations
- Elegant fix: The
skip_io_uring_get_sqepattern cleanly separates test injection from ring manipulation. Retaining the old sync point inside theifblock preserves backward compatibility. - Thorough c_test.c update: All 10
snprintfcalls usinggeteuid()are updated, plusnull_dbgets a proper temp-dir path with overflow checking. - Well-designed Python tool:
gtest_parallel_repro.pyusesstart_new_session=Truefor process group management, properSIGTERM/SIGKILLescalation, isolatedTEST_TMPDIRper process,subprocess.Popenwith argument lists (noshell=True), and structured JSONL output. - Zero performance impact in release builds confirmed by compiler optimization analysis.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D107758097. |
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit b108c3f ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit b108c3f SummarySolid test infrastructure PR that fixes a real io_uring SyncPoint bug, improves c_test parallel isolation, and adds a useful flake reproduction tool. The io_uring fix is correct and well-motivated. The c_test changes are mechanically sound but have a minor Windows-specific concern (pre-existing). The Python tool is well-structured. High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🟡 MEDIUMM1. Windows
|
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D107758097. |
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 1e5cfc6 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 1e5cfc6 SummaryThis PR adds a well-designed parallel gtest flake reproduction tool, fixes a real flaky test bug in High-severity findings (0): Medium-severity findings (1):
Full review (click to expand)Findings🟡 MEDIUMM1. Windows
|
| Context | Affected? | Analysis |
|---|---|---|
| Thread-local io_uring | YES | Fix correctly avoids consuming an SQ slot. io_uring is thread-local (IORING_SETUP_SINGLE_ISSUER), no cross-thread concerns. |
| Parallel test execution | YES | GetTestId() and gtest_parallel_repro.py both target this. UID+PID combination ensures path uniqueness. |
| Production builds | NO | TEST_SYNC_POINT_CALLBACK compiles to nothing with NDEBUG. Zero performance impact. |
Positive Observations
- The io_uring fix correctly addresses the root cause: overwriting the SQE pointer after
io_uring_get_sqe()consumed a slot left stale submissions in the ring. - The Python tool has proper signal handling, process group management (
start_new_session=True+os.killpg), and cleanup. - The
null_dbsnprintf bounds check is correct defensive coding.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
Summary
tools/gtest_parallel_repro.pyto reproduce gtest flaky tests that depend on process-level CPU contention and scheduler delays.TEST_TMPDIR, and optionally pin the workload to a smaller CPU set withtaskset.failures.jsonl, summarize failure keys, support stopping on first failed batch, and optionally rebuild withCOERCE_CONTEXT_SWITCH=1.CLAUDE.mdfor CI-style flaky tests that single-process--gtest_repeator coerce mode alone does not reproduce.Test Plan