feat: add initial support for `array_exists` with lambda expression support by andygrove · Pull Request #3611 · apache/datafusion-comet

andygrove · 2026-02-27T13:51:08Z

Summary

Add native support for array_exists(arr, x -> predicate(x)) in SQL and DataFrame API
First general-purpose lambda expression infrastructure, extensible to array_filter, array_transform, array_forall
Vectorized lambda evaluation: flattens list elements, evaluates lambda body once over expanded batch, reduces per row with SQL three-valued logic
Unsupported lambda bodies (e.g. containing UDFs) fall back to Spark correctly
No complex type support in this initial PR - this will be added later

Add native support for `array_exists(arr, x -> predicate(x))` in SQL and DataFrame API. This is the first general-purpose lambda expression infrastructure, which can later be extended to support `array_filter`, `array_transform`, and `array_forall`. The lambda body is serialized as a regular expression tree where `NamedLambdaVariable` leaf nodes are serialized as `LambdaVariable` proto messages. On the Rust side, `ArrayExistsExpr` evaluates the lambda body vectorized over all elements in a single pass: it flattens list values, expands the batch with repeat indices, appends elements as a `__comet_lambda_var` column, evaluates once, and reduces per row with SQL three-valued logic semantics. Unsupported lambda bodies (e.g. containing UDFs) fall back to Spark. Closes apache#3149

- Remove unused element_type proto field from ArrayExists - Add LargeListArray support via decompose_list helper - Use column index instead of name for lambda variable lookup - Add TimestampNTZType to supported element types - Restore CometNamedLambdaVariable as standalone serde object - Remove SQL-based Scala tests (covered by SQL file tests) - Add DataFrame tests for decimal and date element types - Add negative test for unsupported element type fallback - Add multi-column batch Rust unit test

gstvg · 2026-03-02T17:46:22Z

native/spark-expr/src/array_funcs/array_exists.rs

+        for (i, col) in batch.columns().iter().enumerate() {
+            let expanded = take(col.as_ref(), &repeat_indices_array, None)?;
+            expanded_columns.push(expanded);
+            expanded_fields.push(Arc::new(batch.schema().field(i).clone()));
+        }


non-blocking: I believe this will also expand uncaptured columns (those not referenced in the lambda body)
To avoid that costly expansion, is possible to either:

Use a NullArray as it's creation is O(1) regardless of length,

Only includes on the batch the captured columns and the lambda variable, and rewrite the lambda body adjusting columns indices, as done in http://github.com/apache/datafusion/pull/18329/changes#diff-ac23ff0fe78acd71875341026dd5907736e3e3f49e2c398a69e6b33cb6394ae8R92-R139

Thank @gstvg. Excellent advice. I have implemented this now.

comphead

Thanks @andygrove this is actually great, Im going through this PR today to see how DF can learn from it

spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala

spark/src/main/scala/org/apache/comet/serde/arrays.scala

parthchandra · 2026-04-02T21:24:50Z

spark/src/main/scala/org/apache/comet/serde/arrays.scala

+    }
+
+    expr.function match {
+      case LambdaFunction(body, Seq(_: NamedLambdaVariable), _) =>


Do we need a fallback for cases where we cannot support a particular lambda?

For instance, exists(arr1, x -> exists(arr2, y -> y > x)), nested lambdas will probably not work right.

Thanks. Added.

- Use NullArray for uncaptured columns to avoid costly take() expansion - Add tests for literal false, true, and null lambdas - Fallback when legacy non-three-valued logic mode is configured - Detect and fallback for nested lambda expressions

# Conflicts: # native/spark-expr/src/array_funcs/mod.rs

parthchandra · 2026-04-10T00:38:06Z

spark/src/main/scala/org/apache/comet/serde/arrays.scala

+object CometArrayExists extends CometExpressionSerde[ArrayExists] {
+
+  /** Check if a lambda body contains nested lambda expressions (e.g., nested exists calls). */
+  private def containsNestedLambda(expr: Expression, currentVar: NamedLambdaVariable): Boolean = {


Does currentVar get used?

parthchandra · 2026-04-10T00:39:31Z

spark/src/test/resources/sql-tests/expressions/array/array_exists.sql

+
+statement
+INSERT INTO test_array_exists VALUES (array(1, 2, 3), array('a', 'bb', 'ccc'), array(1.5, 2.5, 3.5), array(false, false, true), array(100, 200, 300), 2), (array(1, 2), array('a', 'b'), array(0.5, 1.5), array(false, false), array(10, 20), 5), (array(), array(), array(), array(), array(), 0), (NULL, NULL, NULL, NULL, NULL, 1), (array(1, NULL, 3), array('a', NULL, 'ccc'), array(1.0, NULL, 3.0), array(true, NULL, false), array(100, NULL, 300), 2)
+


Might be worth adding a test for exists(array(0, null, 2, 3, null), x -> x IS NULL). i.e. null elements but the predicate returns a non null value.

parthchandra · 2026-04-10T00:42:16Z

spark/src/test/resources/sql-tests/expressions/array/array_exists.sql

+-- long type
+query
+SELECT exists(arr_long, x -> x > 250) FROM test_array_exists
+


Should we add a case for TimeStamp/TimestampNTZ?

gstvg · 2026-04-10T06:14:04Z

I believe apache/datafusion#21231 will affect this. A CaseWhen/If within a lambda using it's variable will remove it from the batch

Calling array_exists within a case should be fine because the case projection happens before the lambda variable it's inserted into the batch

I think a check can be added here http://github.com/apache/datafusion-comet/blob/main/spark/src/main/scala/org/apache/comet/serde/conditional.scala until this get's fixed

andygrove added 3 commits February 27, 2026 06:50

fix: remove unused variable binding in lambda pattern match

4e10e4b

comphead mentioned this pull request Feb 27, 2026

Add lambda support and array_transform udf apache/datafusion#18921

Open

andygrove marked this pull request as ready for review March 2, 2026 14:36

gstvg reviewed Mar 2, 2026

View reviewed changes

comphead reviewed Mar 3, 2026

View reviewed changes

andygrove added 2 commits March 20, 2026 14:24

merge: resolve conflicts with apache/main

f4cd045

fix: add missing closing brace after merge resolution

ab1a7c6

parthchandra reviewed Apr 2, 2026

View reviewed changes

andygrove changed the title ~~feat: add array_exists with lambda expression support~~ feat: add initial support for array_exists with lambda expression support Apr 9, 2026

andygrove added 3 commits April 9, 2026 07:01

Merge remote-tracking branch 'apache/main' into feat/array-exists-lambda

563d8bd

# Conflicts: # native/spark-expr/src/array_funcs/mod.rs

chore: apply cargo fmt

394f9e9

parthchandra reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add initial support for `array_exists` with lambda expression support#3611

feat: add initial support for `array_exists` with lambda expression support#3611
andygrove wants to merge 8 commits intoapache:mainfrom
andygrove:feat/array-exists-lambda

andygrove commented Feb 27, 2026 •

edited

Loading

Uh oh!

gstvg Mar 2, 2026 •

edited

Loading

Uh oh!

andygrove Apr 9, 2026

Uh oh!

comphead left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

parthchandra Apr 2, 2026

Uh oh!

parthchandra Apr 8, 2026

Uh oh!

andygrove Apr 9, 2026

Uh oh!

parthchandra Apr 10, 2026

Uh oh!

parthchandra Apr 10, 2026

Uh oh!

parthchandra Apr 10, 2026

Uh oh!

gstvg commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		statement
		INSERT INTO test_array_exists VALUES (array(1, 2, 3), array('a', 'bb', 'ccc'), array(1.5, 2.5, 3.5), array(false, false, true), array(100, 200, 300), 2), (array(1, 2), array('a', 'b'), array(0.5, 1.5), array(false, false), array(10, 20), 5), (array(), array(), array(), array(), array(), 0), (NULL, NULL, NULL, NULL, NULL, 1), (array(1, NULL, 3), array('a', NULL, 'ccc'), array(1.0, NULL, 3.0), array(true, NULL, false), array(100, NULL, 300), 2)

Conversation

andygrove commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

gstvg Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andygrove Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

parthchandra Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

parthchandra Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

andygrove Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

parthchandra Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

parthchandra Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

parthchandra Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

gstvg commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andygrove commented Feb 27, 2026 •

edited

Loading

gstvg Mar 2, 2026 •

edited

Loading