[VL][Delta] Delta CI pipeline by felipepessoto · Pull Request #12278 · apache/gluten

felipepessoto · 2026-06-11T20:43:01Z

What changes are proposed in this pull request?

I wanted to create this PR to start discussing this, so we can have an idea of how it would work, if this is worth, etc.

Tests are failing, it can help uncover bugs.

How was this patch tested?

CI

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Copilot

Adds a GitHub Actions workflow `delta_spark_ut.yml` (plus a helper `util/delta-spark-ut/setup-delta.sh`) that runs delta-io/delta's `spark` sbt module unit tests with a Gluten Velox bundle built from this repository's source on the classpath. The pipeline lets us catch regressions in Gluten against the latest Delta release before they reach users. Triggers: * `workflow_dispatch` with overridable `delta_ref` (default v4.2.0), `spark_version` (default 4.1), and `test_parallelism` (default 1). * `pull_request` when the workflow files, `gluten-delta/**`, or the reused `backends-velox/.../DeltaSQLCommandTest.scala` change. Three jobs: 1. `build-native-lib-centos-7` -- builds the Velox/Gluten native libraries in the `apache/gluten:vcpkg-centos-7-gcc13` container (x86_64), reusing `dev/ci-velox-buildstatic-centos-7.sh` and the ccache layout other pipelines already use. Uploads the cpp/build tree and the Arrow jars from the local m2 repo as artifacts. 2. `build-gluten-bundle` -- in `apache/gluten:centos-9-jdk17`, downloads the native artifacts and runs `mvn clean install` with `-Pspark-4.1 -Pscala-2.13 -Pjava-17 -Pbackends-velox -Pdelta` (install, not package, so the gluten-delta jar lands in m2 before the shaded fat jar is built; `-Dmaven.compiler.release=17` defeats any user settings.xml pinning release=1.8). Uploads the resulting `gluten-velox-bundle-spark4.1_2.13-linux_amd64-*.jar`. 3. `delta-spark-test` -- in `apache/gluten:centos-9-jdk17`, sharded 8-ways (matching Delta's upstream `spark_test.yaml`), runs `setup-delta.sh` and then `sbt spark/test`. `timeout-minutes: 300` to fail faster than the 6h GH job timeout. Test reports are uploaded unconditionally; on failure we also upload `hs_err_pid*` and `/tmp/*.hprof(.gz)` heap dumps for post-mortem analysis. Non-obvious design decisions captured in the code (and worth flagging for reviewers): * **Bundle goes only in `delta/spark-unified/lib/`, NOT `delta/spark/lib/`.** sbt auto-scans `<baseDirectory>/lib` via `unmanagedBase`, and `unmanagedJars` are project-scoped (not inherited by dependents). In Delta v4.2.0 the layout is - `sparkV1 = project in file("spark")` -> `spark/lib` - `spark = project in file("spark-unified")` -> `spark-unified/lib` Placing the bundle only under `spark-unified/lib/` exposes it to the unified `spark` project's Compile and Test classpaths (where the forked test JVM loads `org.apache.gluten.GlutenPlugin` by name) while keeping it off `sparkV1`'s Compile classpath. The latter matters because the bundle pulls extra symbols under `org.apache.spark.sql` into scope, which would collide with Delta's own `MergeOutputGeneration.scala` (it imports both `org.apache.spark.sql._` and `org.apache.spark.sql.delta.ClassicColumnConversions._`). * **`JAVA_TOOL_OPTIONS` carries the Netty + JDK-17 `--add-opens` set, but NOT `-Xmx`.** The flag set mirrors `extraJavaTestArgs` from Gluten's own root `pom.xml`. The crucial non-Delta-default flag is `-Dio.netty.tryReflectionSetAccessible=true`: without it Gluten's bundled Arrow allocator throws `UnsupportedOperationException: sun.misc.Unsafe or DirectByteBuffer ... not available` on first direct-buffer allocation. Delta's own `Test/javaOptions` (see `project/CrossSparkVersions.scala` `java17TestSettings`) already sets the base `--add-opens` but not the Netty property -- their own suites just don't load Arrow this way. `-Xmx` is intentionally NOT in `JAVA_TOOL_OPTIONS`: it is processed BEFORE the command line, so Delta's explicit `-Xmx1024m` would still win (last `-Xmx` wins). * **Forked test JVM heap is bumped to `-Xmx6G` via `set spark / Test / javaOptions ++= Seq("-Xmx6G", ...)`.** `++=` appends to Delta's own seq, so our `-Xmx6G` lands AFTER `-Xmx1024m` and wins. Delta v4.2.0's 1G cap is far too tight once Gluten + Velox + Arrow + JNI are loaded -- a DV+CDC merge suite OOMs inside `RoaringBitmapArray.extendBitmaps` at that heap size. We also turn on `HeapDumpOnOutOfMemoryError` with `HeapDumpPath=/tmp/` so future OOMs come with a dump. `_JAVA_OPTIONS` is avoided because it would also override the sbt launcher heap and defeat budget accounting. * **sbt launcher heap is `-J-Xmx4G`** -- sbt only compiles tests and orchestrates the test fork, so 4G is comfortable and leaves more of the 16 GB GH runner for the forked test JVM. * **Single shared sbt/Ivy/Coursier cache across all shards** -- all shards have the same dependency tree, so a single cache key (with parallel saves resolved first-write-wins by GH) gives a better storage / hit-rate tradeoff than 8 isolated caches. * **`yum install` does NOT include `curl`** -- the `apache/gluten:centos-9-jdk17` image ships `curl-minimal`, which provides the `curl` command (used by `sbt-launch-lib.bash` to download `sbt-launch.jar`) but conflicts with the full `curl` package. Installing the full package would fail with "package curl-minimal... conflicts with curl". * **`DeltaSQLCommandTest` is reused, not duplicated.** `setup-delta.sh` overwrites Delta's own `DeltaSQLCommandTest.scala` with the existing copy under `backends-velox/src-delta40/...`. That file registers the Gluten plugin via typed `GlutenConfig` / `VeloxDeltaConfig` references, which resolve at test-compile time because the bundle is already on the unified `spark` project's Test classpath. * **Delta's scalastyle `HeaderMatchesChecker` is disabled in the cloned Delta's `scalastyle-config.xml`.** Our reused `DeltaSQLCommandTest.scala` carries Gluten's ASF-only license header, which does not match Delta's expected regex (ASF + Spark-mod block + Delta copyright). `HeaderMatchesChecker` is a file-level checker that does not honor `// scalastyle:off`, so it has to be disabled at config level. The config is wired in via `ThisBuild / scalastyleConfig` in `project/Checkstyle.scala`, so a single edit covers every sbt sub-project. Scope is intentionally limited to Velox + x86_64 to keep the matrix small. ClickHouse and aarch64 can be added later if needed. Co-authored-by: Copilot

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a GitHub Actions pipeline to build a Gluten Velox bundle and run Delta Lake’s spark module unit tests against it, including automation to patch a Delta checkout and adjust its build config for the CI run.

Changes:

Introduces a new delta_spark_ut.yml workflow with native build, bundle assembly, and sharded Delta test execution.
Adds a setup-delta.sh helper to clone Delta, inject the bundle jar into the correct sbt project lib/, patch DeltaSQLCommandTest, and disable a scalastyle header check.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
.github/workflows/util/delta-spark-ut/setup-delta.sh	Automates cloning/patching Delta and adjusting scalastyle to make Gluten-enabled tests run.
.github/workflows/delta_spark_ut.yml	Defines the CI workflow to build Gluten artifacts and execute Delta Spark unit tests in shards.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

felipepessoto · 2026-06-12T08:23:47Z

@malinjawi @zhztheplayer @philo-he

zhztheplayer · 2026-06-12T16:40:01Z

Can we remove the duplicated tests in Gluten's codebase, if they are covered by the new way?

felipepessoto · 2026-06-12T18:44:47Z

I guess so. Idk exactly what are these duplicates, if they are exact copies, and from which versions.

philo-he

@felipepessoto, thanks for the PR.

philo-he · 2026-06-12T22:54:35Z

+  cancel-in-progress: true
+
+jobs:
+  build-native-lib-centos-7:


Is this job a duplicate of the one in velox_backend_x86.yml? If so, we might consider moving the Delta tests into that file to reuse the native artifact it builds, since that artifact can't be shared across different workflows.

This is also a consideration for reducing our GHA usage, see #12288

Running delta-io/delta's spark suite against the Gluten Velox bundle produces many expected failures. This adds a committed known-failures baseline and a per-shard gate so CI is green when only baseline failures occur and red on a genuine regression, enabling incremental fixes. - Inject ScalaTest's -u JUnit XML reporter (Delta only configures the console reporter, so no machine-readable per-test results existed). - Capture sbt's exit so expected test failures don't fail the step; fail loudly only when zero reports are produced (compile/launch failure). - compare-test-results.py classifies each test vs known-failures.txt (regression / expected / now-passing) in enforce/seed/aggregate modes. - Add update_baseline + fail_on_fixed inputs and an aggregate job that emits a ready-to-commit baseline artifact. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

+    if args.failures_out:
+        write_entries(args.failures_out, failed)
+    if args.ran_out:
+        write_entries(args.ran_out, passed | failed)


+    for pattern in ("**/TEST-*.xml", "**/target/**/*.xml"):
+        xml_files.extend(glob.glob(os.path.join(reports_dir, pattern), recursive=True))
+    xml_files = sorted(set(xml_files))


+sed -i \
+  's|<check level="error" class="org.scalastyle.file.HeaderMatchesChecker" enabled="true">|<check level="error" class="org.scalastyle.file.HeaderMatchesChecker" enabled="false">|' \
+  "$SCALASTYLE_CONFIG"


At 8-way sharding, ~5 of the 8 delta-spark-test shards consistently exceed the 300-minute job timeout: Delta's generated merge/CDC/DV suites are very slow under Gluten. Observed in run 16 (shards 0,1,2,3,6 all hit the 300-min cap and were cancelled, while 4,5,7 finished in <130 min), and reproduced on the current run. Cancelled shards never upload their results, so the known-failures baseline could only ever capture a subset of suites. Delta's GreedyHashStrategy balances high-duration suites across shards by estimated duration, so doubling NUM_SHARDS 8 -> 16 roughly halves per-shard wall time and lets every shard finish (and report) within the timeout. Bump timeout-minutes 300 -> 350 for extra margin. TEST_PARALLELISM_COUNT stays 1 (each forked test JVM uses ~8G, so >1 would OOM the runner). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 11, 2026 20:43

github-actions Bot added the INFRA label Jun 11, 2026

Copilot AI reviewed Jun 11, 2026

View reviewed changes

Comment thread .github/workflows/delta_spark_ut.yml

Comment thread .github/workflows/delta_spark_ut.yml

Comment thread .github/workflows/delta_spark_ut.yml

Comment thread .github/workflows/util/delta-spark-ut/setup-delta.sh

philo-he reviewed Jun 12, 2026

View reviewed changes

github-actions Bot added the DOCS label Jun 14, 2026

Copilot AI review requested due to automatic review settings June 14, 2026 00:20

felipepessoto force-pushed the delta_pipeline branch from 86acf61 to e190fc0 Compare June 14, 2026 00:20

Copilot AI reviewed Jun 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL][Delta] Delta CI pipeline#12278

[VL][Delta] Delta CI pipeline#12278
felipepessoto wants to merge 3 commits into
apache:mainfrom
felipepessoto:delta_pipeline

felipepessoto commented Jun 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

felipepessoto commented Jun 12, 2026

Uh oh!

zhztheplayer commented Jun 12, 2026

Uh oh!

felipepessoto commented Jun 12, 2026

Uh oh!

philo-he left a comment

Uh oh!

philo-he Jun 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

felipepessoto commented Jun 11, 2026

What changes are proposed in this pull request?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

felipepessoto commented Jun 12, 2026

Uh oh!

zhztheplayer commented Jun 12, 2026

Uh oh!

felipepessoto commented Jun 12, 2026

Uh oh!

philo-he left a comment

Choose a reason for hiding this comment

Uh oh!

philo-he Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants