feat(skills): unify TPC-H runner via performance_test.py --nsys-profile by bwyogatama · Pull Request #882 · sirius-db/sirius

bwyogatama · 2026-06-04T22:57:25Z

Summary

Make test/tpch_performance/performance_test.py the single canonical TPC-H runner shared by the benchmark, profile-analyzer, and optimization-advisor skills. Previously, profile-analyzer recommended a separate set of shell runners (profile_tpch_nsys.sh, run_tpch_parquet.sh, benchmark_and_validate.sh) for the same workload — duplicating logic and using a different query source. It also pointed users at flags/scripts that don't exist (--db-path, profile_tpch_nsys_duckdb_native.sh).

After this PR all three skills point at the same Python runner. Profiling goes through nsys_report.sh, which now delegates to performance_test.py --nsys-profile and flattens per-query outputs so the existing nsys_analyze.sh / nsys_compare.sh tooling keeps working unchanged.

What changed

test/tpch_performance/performance_test.py (+310 lines)
- New --nsys-profile flag: subprocess-per-query DuckDB CLI wrapped in nsys profile with --capture-range=cudaProfilerApi.
- Helpers _build_views_sql (factored from open_connection), _build_nsys_temp_sql, run_nsys_profile.
- Per-query outputs under <bench>/sirius/q<N>/{nsys.nsys-rep, nsys.sqlite, nsys.sql, timings.csv, log_dir/}.
- New --query-timeout flag (default 90s, only for --nsys-profile).
- metadata.json gains nsys_profile field.
- Reuses existing queries.py, pin SQL, parquet resolution, and csv/runtimes.csv format.
test/tpch_performance/nsys_report.sh
- Phase 2 delegates to pixi run python performance_test.py --nsys-profile instead of profile_tpch_nsys.sh, then flattens <bench>/sirius/q<N>/{nsys.sqlite, nsys.nsys-rep, timings.csv, nsys_stdout.txt} into the legacy flat profiles/q<N>.<ext> layout so Phase 3/4 (analyze + metric extraction) work unchanged.
- Adds --parquet-dir (default \$PROJECT_DIR/test_datasets/tpch_parquet_sf<SF>).
- Removes --cache-level (set scan_cache_level in the Sirius YAML config instead).
.claude/skills/profile-analyzer/SKILL.md
- Drops broken --db-path and profile_tpch_nsys_duckdb_native.sh references.
- Workflow A Step 2 (non-profiled timing) now uses performance_test.py (the same runner the benchmark skill uses).
- Reframes NVTX domain numbering as discovered from each profile at analysis time (not Sirius-registered).
- Drops hardcoded "30-40 CUDA streams" (configurable via pipeline.num_threads).
- Tweaks "24 on Ada Lovelace" to note nsys_analyze.sh reads maxBlocksPerSm dynamically.
.claude/skills/optimization-advisor/SKILL.md
- Three call sites (Profiling Overhead Warning, Validating Optimizations, Output Format reminder) now reference performance_test.py instead of run_tpch_parquet.sh / benchmark_and_validate.sh.
test/tpch_performance/CLAUDE.md
- Running Benchmarks section restructured around performance_test.py (covers --engine, --iterations, --mode, --queries, --pin, --validation, --nsys-profile, --query-timeout, etc.).
- Profiling section now points at performance_test.py --nsys-profile as the primary entry; nsys_report.sh documented as the orchestrator.
- Key Files table restructured: performance_test.py promoted to top; legacy shell runners demoted to a "Legacy (kept for CI)" group at the bottom.

Out of scope

Legacy shell runners (benchmark_and_validate.sh, run_tpch_parquet.sh, profile_tpch_nsys.sh, run_tpch_legacy.sh, etc.) stay in the tree because .github/workflows/test.yml:156 and .ai-helper/commands.yaml:34/44/78 still call them. Those callers should be migrated to performance_test.py in a follow-up PR before the shell runners are deleted.

Follow-up fixes landed on this branch

6923e78 fix(skills): pass -unsigned to DuckDB CLI in --nsys-profile mode — caught while smoke-testing nsys_report.sh end-to-end. Without -unsigned, the DuckDB CLI rejects the locally-built unsigned Sirius extension. The Python runner already handles this via allow_unsigned_extensions=true in open_connection; mirror that for the subprocess.

Test plan

Static checks pass — python3 -c \"import ast; ast.parse(open('test/tpch_performance/performance_test.py').read())\" and bash -n test/tpch_performance/nsys_report.sh both clean.
Pre-commit clean — black, codespell, end-of-file-fixer, trim trailing whitespace, smartquote, debug statements, etc. all green on the changed files (verified by the commit hook running on both c7b93f2 and 6923e78). black auto-reformatted performance_test.py once during the initial commit; subsequent runs are clean.
--nsys-profile argparse contract verified — rejected with the expected error for: --engine cpu, --engine both, --validation, --mode sequential, --duckdb-profiling. --help lists both new flags.
Non-profile baseline (CPU side) unchanged — performance_test.py --input ~/sirius/test_datasets/tpch_parquet_sf1 --engine cpu --queries 1,3,6 --iterations 1 produces the documented layout: config.yml, metadata.json (with new nsys_profile: false field), csv/runtimes.csv with the expected engine,query,iteration,runtime_s rows, duckdb/q<N>/result.txt.
New --nsys-profile mode — code path — flag validation passes, run_nsys_profile is invoked, nsys.sql is generated with the exact layout from profile_tpch_nsys.sh:170-208 (views → optional pin → cold iter → profiler_start → hot iter(s) → profiler_stop → COPY timings to per-query timings.csv), per-query subdir (sirius/q<N>/{nsys.sql, nsys_stdout.txt, log_dir/}) created, subprocess spawned and exit-code handled (NaN rows recorded in runtimes.csv on failure).
End-to-end orchestrator — code path — nsys_report.sh --sf 1 --parquet-dir ... --output-dir ... --iterations 2 1 3 6 runs all phases: PARQUET_DIR resolution from --sf, pixi run python performance_test.py --nsys-profile invocation with the correct flag set, <REPORT_DIR>/profiles_tree/sirius/q<N>/ populated by the Python runner, [Phase 2] Flattening per-query outputs into <REPORT_DIR>/profiles/ step reached.
Successful GPU run end-to-end — produces sirius/q<N>/nsys.{nsys-rep, sqlite} plus a report.md / summary.json / metadata.json via nsys_report.sh. Pending — needs to be run on a host with the NVIDIA driver loaded outside the verification sandbox.
Existing analyze/compare tools unchanged on the new outputs — nsys_analyze.sh reports/<ts>/profiles/ and nsys_compare.sh reports/<a> reports/<b>. Pending — depends on the GPU run above producing real .sqlite files.
CI green — the legacy benchmark_and_validate.sh 1 SF1 step continues to work in .github/workflows/test.yml. Pending — will be reflected by the GitHub Actions check on this PR.

🤖 Generated with Claude Code

Make performance_test.py the single canonical TPC-H runner shared by the benchmark, profile-analyzer, and optimization-advisor skills. - Add --nsys-profile to performance_test.py: subprocess-per-query DuckDB CLI wrapped in `nsys profile`, producing per-query .nsys-rep + .sqlite under <bench>/sirius/q<N>/. Reuses existing parquet-view setup, queries.py, and pin SQL. - nsys_report.sh now delegates Phase 2 to performance_test.py --nsys-profile and flattens per-query outputs into profiles/ so nsys_analyze.sh and Phase 4 metric extraction continue to work unchanged. Adds --parquet-dir; removes --cache-level (use Sirius YAML config instead). - profile-analyzer SKILL.md: drop broken --db-path / profile_tpch_nsys_duckdb_native.sh references; Workflow A Step 2 now uses performance_test.py; reframe NVTX domain numbering as discovered at analysis time; drop hardcoded "30-40 CUDA streams" and "24 on Ada Lovelace" claims. - optimization-advisor SKILL.md: three call sites now reference performance_test.py instead of run_tpch_parquet.sh / benchmark_and_validate.sh. - test/tpch_performance/CLAUDE.md: restructure Running Benchmarks around performance_test.py; demote shell runners to a "Legacy (kept for CI)" group in Key Files. Legacy shell runners (benchmark_and_validate.sh, run_tpch_parquet.sh, profile_tpch_nsys.sh, etc.) stay in tree for backward compat with .github/workflows/test.yml and .ai-helper/commands.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Without -unsigned, the DuckDB CLI rejects locally-built Sirius extensions with "signature is either missing or invalid and unsigned extensions are disabled by configuration". The Python runner already handles this via `allow_unsigned_extensions=true` in the duckdb.connect config (open_connection at performance_test.py:271); mirror that for the subprocess so nsys profiling works against an unsigned dev build. Caught while smoke-testing nsys_report.sh end-to-end against the local build/release/extension/sirius/sirius.duckdb_extension. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bwyogatama and others added 2 commits June 4, 2026 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): unify TPC-H runner via performance_test.py --nsys-profile#882

feat(skills): unify TPC-H runner via performance_test.py --nsys-profile#882
bwyogatama wants to merge 2 commits into
sirius-db:devfrom
bwyogatama:worktree-consolidate-skills

bwyogatama commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bwyogatama commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Out of scope

Follow-up fixes landed on this branch

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bwyogatama commented Jun 4, 2026 •

edited

Loading