Skip to content

[Bug] arrays_zip fails with more than 6 arguments in SparkSQL due to arity limits #622

@JinyuanZhang617

Description

@JinyuanZhang617

Component Selection

  • Core Engine (Expression eval, Memory, Vector)
  • Connectors / File Formats (Hive, Parquet, etc.)
  • API / Bindings (Python, etc.)
  • Build
  • Other

Describe the Bug

When executing a Spark SQL query containing arrays_zip with more than 6 arguments (e.g., arrays_zip(a1, a2, ..., a10)) on Bolt via Gluten, the executor fails with an
error indicating that the scalar function is not registered for the given number of arguments.

**Error message:**                                                                                                                                                          
```text                                                                                                                                                                     
Scalar function arrays_zip not registered with arguments:                                                                                                                   
  (ARRAY<VARCHAR> x 10)                                                                                                                                                     
Found function registered with the following signatures:                                                                                                                    
  arity = 2,3,4,5,6                                          

Reproduction Steps

`
CREATE TABLE OLAP_gluten_test.arrays_zip_10args_repro (
a1 array,
a2 array,
a3 array,
a4 array,
a5 array,
a6 array,
a7 array,
a8 array,
a9 array,
a10 array
) USING parquet;

INSERT INTO OLAP_gluten_test.arrays_zip_10args_repro VALUES
(array('a','b'), array('c','d'), array('e','f'), array('g','h'),
array('i','j'), array('k','l'), array('m','n'), array('o','p'),
array('q','r'), array('s','t'));

SELECT zipped
FROM OLAP_gluten_test.arrays_zip_10args_repro
LATERAL VIEW explode(
arrays_zip(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10)
) tmp AS zipped;
`

Bolt Version / Commit ID

main branch @95bf0f9a99acaf932170d30682b7688ff6226d44

System Configuration

- **OS**: (e.g. Ubuntu 22.04, CentOS 7)
- **Compiler**: (e.g. GCC 11, Clang 14)
- **Build Type**: (Debug / Release / RelWithDebInfo)
- **CPU Arch**: (e.g. x86_64 AVX2, ARM64)
- **Framework**: (e.g. Spark 3.3, PrestoDB)

Logs / Stack Trace

Expected Behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions