Skip to content

chore: add script to run gluten/spark ut with bolt and add to ci#586

Draft
zhangxffff wants to merge 4 commits into
bytedance:mainfrom
zhangxffff:run_gluten_ut
Draft

chore: add script to run gluten/spark ut with bolt and add to ci#586
zhangxffff wants to merge 4 commits into
bytedance:mainfrom
zhangxffff:run_gluten_ut

Conversation

@zhangxffff

@zhangxffff zhangxffff commented May 24, 2026

Copy link
Copy Markdown
Collaborator

What problem does this PR solve?

Issue Number: close #588

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 🚀 Performance improvement (optimization)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🔨 Refactoring (no logic changes)
  • 🔧 Build/CI or Infrastructure changes
  • 📝 Documentation only

Description

This PR introduce a script to run gluten ut with bolt backend and add it into bolt CI.

Added File:

  1. scripts/gluten_ut/run.sh to run gluten ut in parallel.
  2. scripts/gluten_ut/blacklist.txt a blacklist contains known failure case.
  3. scripts/gluten_ut/run.sh slow test suites in gluten that would dispatch first to reduce parallel run cost.
  4. .github/workflows/bolt_spark_ut.yml workflows to run gluten ut in bolt

Usage

export GLUTEN_HOME=/directory_of_gluten
export SPARK_HOME=/directory_of_spark_3_5_5_src
scripts/gluten_ut/run.sh

Key design

Gluten's mvn test runs via the scalatest-maven-plugin, which doesn't parallelize across suites. The full UT matrix takes about 2 hours end-to-end — too slow for the Bolt CI loop. This PR runs each suite as its own mvn invocation in parallel, dropping wall time to roughly 30 minutes with 8 JOBS in CI.

How it works

  1. Build test artifacts. A single mvn install -DskipTests produces test-classes for every module.
  2. Discover suites. Walk every */test-classes/*.class, and filter the test suites by pattern into a flat suite list.
  3. Prioritize the long tail. Reorder the list so entries in slow_suites.txt (suites running ≥ 3 min) dispatch first, preventing them from trailing on a single worker at the end of the run.
  4. Parallel + isolated dispatch. xargs -P JOBS runs one mvn per suite. Each invocation is wrapped in bwrap with a private target/surefire, target/surefire-reports, and a --tmp-overlay-ed test-classes/, so concurrent suites can't race on shared write paths or on Gluten's unit-tests-working-home/ warehouse.
  5. Classify results. Parse each suite log for *** FAILED *** / *** ABORTED *** markers, and look them up in blacklist.txt. Matches are expected and ignored; anything else is reported as an unexpected failure and fails the run, with the offending keys printed for triage.

Performance Impact

  • No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).

  • Positive Impact: I have run benchmarks.

    Click to view Benchmark Results
    Paste your google-benchmark or TPC-H results here.
    Before: 10.5s
    After:   8.2s  (+20%)
    
  • Negative Impact: Explained below (e.g., trade-off for correctness).

Release Note

Please describe the changes in this PR

Release Note:

Release Note:
- Fixed a crash in `substr` when input is null.
- optimized `group by` performance by 20%.

Checklist (For Author)

  • I have added/updated unit tests (ctest).
  • I have verified the code with local build (Release/Debug).
  • I have run clang-format / linters.
  • (Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
  • No need to test or manual test.

Breaking Changes

  • No

  • Yes (Description: ...)

    Click to view Breaking Changes
    Breaking Changes:
    - Description of the breaking change.
    - Possible solutions or workarounds.
    - Any other relevant information.
    

@zhangxffff zhangxffff force-pushed the run_gluten_ut branch 2 times, most recently from d5f35a5 to c16031f Compare May 26, 2026 09:48
@zhangxffff zhangxffff marked this pull request as draft May 27, 2026 06:59
zhangxffff and others added 3 commits June 10, 2026 09:20
… failures

run 27086097418 still exited 1: the blacklist was missing the two
GlutenDateExpressionsSuite#Gluten - {unix_timestamp,to_unix_timestamp}
cases (same timestamp-handling gap as the already-listed cast-to-timestamp
failures). Replaying run.sh's summary over the run's reports now yields
expected=17 unexpected=0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…red)

run_one_suite appends the GLUTEN_UT_MVN_RC marker right after mvn's
colorised output, which can end with a trailing ANSI reset and no newline.
The marker line then reads "\e[0m\e[0mGLUTEN_UT_MVN_RC=0", so the summary's
'^'-anchored `grep -m1 '^GLUTEN_UT_MVN_RC='` never matched. Under
`set -euo pipefail` the failed `rc=$(grep|cut)` aborted the script on the
first suite — right after the "Summary" banner — so the per-failure `! key`
lines and the expected/unexpected counts were never printed and the step
always exited 1, regardless of the blacklist.

Fix both sides: write the marker on its own line (leading \n), and match it
un-anchored + ANSI-tolerant with a `|| true` guard so a stray log can't kill
the summary.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Chore] add script and workflow to run gluten UT

1 participant