feat(skills): unify TPC-H runner via performance_test.py --nsys-profile#882
Open
bwyogatama wants to merge 2 commits into
Open
feat(skills): unify TPC-H runner via performance_test.py --nsys-profile#882bwyogatama wants to merge 2 commits into
bwyogatama wants to merge 2 commits into
Conversation
Make performance_test.py the single canonical TPC-H runner shared by the benchmark, profile-analyzer, and optimization-advisor skills. - Add --nsys-profile to performance_test.py: subprocess-per-query DuckDB CLI wrapped in `nsys profile`, producing per-query .nsys-rep + .sqlite under <bench>/sirius/q<N>/. Reuses existing parquet-view setup, queries.py, and pin SQL. - nsys_report.sh now delegates Phase 2 to performance_test.py --nsys-profile and flattens per-query outputs into profiles/ so nsys_analyze.sh and Phase 4 metric extraction continue to work unchanged. Adds --parquet-dir; removes --cache-level (use Sirius YAML config instead). - profile-analyzer SKILL.md: drop broken --db-path / profile_tpch_nsys_duckdb_native.sh references; Workflow A Step 2 now uses performance_test.py; reframe NVTX domain numbering as discovered at analysis time; drop hardcoded "30-40 CUDA streams" and "24 on Ada Lovelace" claims. - optimization-advisor SKILL.md: three call sites now reference performance_test.py instead of run_tpch_parquet.sh / benchmark_and_validate.sh. - test/tpch_performance/CLAUDE.md: restructure Running Benchmarks around performance_test.py; demote shell runners to a "Legacy (kept for CI)" group in Key Files. Legacy shell runners (benchmark_and_validate.sh, run_tpch_parquet.sh, profile_tpch_nsys.sh, etc.) stay in tree for backward compat with .github/workflows/test.yml and .ai-helper/commands.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without -unsigned, the DuckDB CLI rejects locally-built Sirius extensions with "signature is either missing or invalid and unsigned extensions are disabled by configuration". The Python runner already handles this via `allow_unsigned_extensions=true` in the duckdb.connect config (open_connection at performance_test.py:271); mirror that for the subprocess so nsys profiling works against an unsigned dev build. Caught while smoke-testing nsys_report.sh end-to-end against the local build/release/extension/sirius/sirius.duckdb_extension. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Make
test/tpch_performance/performance_test.pythe single canonical TPC-H runner shared by thebenchmark,profile-analyzer, andoptimization-advisorskills. Previously,profile-analyzerrecommended a separate set of shell runners (profile_tpch_nsys.sh,run_tpch_parquet.sh,benchmark_and_validate.sh) for the same workload — duplicating logic and using a different query source. It also pointed users at flags/scripts that don't exist (--db-path,profile_tpch_nsys_duckdb_native.sh).After this PR all three skills point at the same Python runner. Profiling goes through
nsys_report.sh, which now delegates toperformance_test.py --nsys-profileand flattens per-query outputs so the existingnsys_analyze.sh/nsys_compare.shtooling keeps working unchanged.What changed
test/tpch_performance/performance_test.py(+310 lines)--nsys-profileflag: subprocess-per-query DuckDB CLI wrapped innsys profilewith--capture-range=cudaProfilerApi._build_views_sql(factored fromopen_connection),_build_nsys_temp_sql,run_nsys_profile.<bench>/sirius/q<N>/{nsys.nsys-rep, nsys.sqlite, nsys.sql, timings.csv, log_dir/}.--query-timeoutflag (default 90s, only for--nsys-profile).metadata.jsongainsnsys_profilefield.queries.py, pin SQL, parquet resolution, andcsv/runtimes.csvformat.test/tpch_performance/nsys_report.shpixi run python performance_test.py --nsys-profileinstead ofprofile_tpch_nsys.sh, then flattens<bench>/sirius/q<N>/{nsys.sqlite, nsys.nsys-rep, timings.csv, nsys_stdout.txt}into the legacy flatprofiles/q<N>.<ext>layout so Phase 3/4 (analyze + metric extraction) work unchanged.--parquet-dir(default\$PROJECT_DIR/test_datasets/tpch_parquet_sf<SF>).--cache-level(setscan_cache_levelin the Sirius YAML config instead)..claude/skills/profile-analyzer/SKILL.md--db-pathandprofile_tpch_nsys_duckdb_native.shreferences.performance_test.py(the same runner thebenchmarkskill uses).pipeline.num_threads).nsys_analyze.shreadsmaxBlocksPerSmdynamically..claude/skills/optimization-advisor/SKILL.mdperformance_test.pyinstead ofrun_tpch_parquet.sh/benchmark_and_validate.sh.test/tpch_performance/CLAUDE.mdperformance_test.py(covers--engine,--iterations,--mode,--queries,--pin,--validation,--nsys-profile,--query-timeout, etc.).performance_test.py --nsys-profileas the primary entry;nsys_report.shdocumented as the orchestrator.performance_test.pypromoted to top; legacy shell runners demoted to a "Legacy (kept for CI)" group at the bottom.Out of scope
benchmark_and_validate.sh,run_tpch_parquet.sh,profile_tpch_nsys.sh,run_tpch_legacy.sh, etc.) stay in the tree because.github/workflows/test.yml:156and.ai-helper/commands.yaml:34/44/78still call them. Those callers should be migrated toperformance_test.pyin a follow-up PR before the shell runners are deleted.Follow-up fixes landed on this branch
fix(skills): pass -unsigned to DuckDB CLI in --nsys-profile mode— caught while smoke-testingnsys_report.shend-to-end. Without-unsigned, the DuckDB CLI rejects the locally-built unsigned Sirius extension. The Python runner already handles this viaallow_unsigned_extensions=trueinopen_connection; mirror that for the subprocess.Test plan
python3 -c \"import ast; ast.parse(open('test/tpch_performance/performance_test.py').read())\"andbash -n test/tpch_performance/nsys_report.shboth clean.black,codespell,end-of-file-fixer,trim trailing whitespace, smartquote, debug statements, etc. all green on the changed files (verified by the commit hook running on bothc7b93f2and6923e78).blackauto-reformattedperformance_test.pyonce during the initial commit; subsequent runs are clean.--nsys-profileargparse contract verified — rejected with the expected error for:--engine cpu,--engine both,--validation,--mode sequential,--duckdb-profiling.--helplists both new flags.performance_test.py --input ~/sirius/test_datasets/tpch_parquet_sf1 --engine cpu --queries 1,3,6 --iterations 1produces the documented layout:config.yml,metadata.json(with newnsys_profile: falsefield),csv/runtimes.csvwith the expectedengine,query,iteration,runtime_srows,duckdb/q<N>/result.txt.--nsys-profilemode — code path — flag validation passes,run_nsys_profileis invoked,nsys.sqlis generated with the exact layout fromprofile_tpch_nsys.sh:170-208(views → optional pin → cold iter →profiler_start→ hot iter(s) →profiler_stop→ COPY timings to per-querytimings.csv), per-query subdir (sirius/q<N>/{nsys.sql, nsys_stdout.txt, log_dir/}) created, subprocess spawned and exit-code handled (NaN rows recorded inruntimes.csvon failure).nsys_report.sh --sf 1 --parquet-dir ... --output-dir ... --iterations 2 1 3 6runs all phases: PARQUET_DIR resolution from--sf,pixi run python performance_test.py --nsys-profileinvocation with the correct flag set,<REPORT_DIR>/profiles_tree/sirius/q<N>/populated by the Python runner,[Phase 2] Flattening per-query outputs into <REPORT_DIR>/profiles/step reached.sirius/q<N>/nsys.{nsys-rep, sqlite}plus areport.md/summary.json/metadata.jsonviansys_report.sh. Pending — needs to be run on a host with the NVIDIA driver loaded outside the verification sandbox.nsys_analyze.sh reports/<ts>/profiles/andnsys_compare.sh reports/<a> reports/<b>. Pending — depends on the GPU run above producing real.sqlitefiles.benchmark_and_validate.sh 1SF1 step continues to work in.github/workflows/test.yml. Pending — will be reflected by the GitHub Actions check on this PR.🤖 Generated with Claude Code