Commit 52b845f
[GLUTEN] Fix wrong input_file_name() for BHJ build-side LocalRelation
## Background
When running queries like below on Gluten (e.g. with the Velox backend),
`input_file_name()` returns an empty string:
```sql
SELECT a.event, input_file_name() AS fname
FROM parquet_table a
JOIN (SELECT X AS k1, 123L AS k2) b
ON a.event = b.k1 AND a.device_id = b.k2;
```
The physical plan contains (key fragment):
```
ProjectExecTransformer [..., input_file_name#1816 AS fname]
+- BroadcastHashJoinExecTransformer ..., BuildLeft
:- InputIteratorTransformer[..., input_file_name#1816]
: +- RowToColumnar
: +- *(1) Project [..., input_file_name() AS input_file_name#1816] ← BUG
: +- ColumnarToRow
: +- BroadcastQueryStage 0
: +- ColumnarBroadcastExchange
: +- LocalTableScan [...] ← no file context
+- ProjectExecTransformer [..., input_file_name#1816]
+- FileScanTransformer parquet ...[..., input_file_name#1816] ← same ExprId
```
## Root cause
`PushDownInputFileExpression.PreOffload` originally injects
`Project [..., input_file_name() AS attr#N]` above **every** `LeafExecNode`.
1. When BHJ's build side is a `LocalTableScanExec` / `RangeExec` /
`RDDScanExec` etc. (no real file context), `input_file_name()` has no
`InputFileBlockHolder` thread-local and always returns `""`.
2. Both leaves end up reusing the same `ExprId`. When BHJ resolves
`left ++ right`, the outer `Project` is rebound to the build-side empty
attribute, so the final query returns an empty file name.
## Fix
Only inject `input_file_name()` on leaves that can really populate
`InputFileBlockHolder`:
- `FileSourceScanExec`
- v2 `BatchScanExec`
- Hive table scan (`HiveTableScanExecTransformer.isHiveTableScan`)
- `BatchScanExecTransformerBase` (already special-cased in community)
In addition, the `ProjectExec` match in `PreOffload` now requires that the
subtree actually contains at least one such file-aware source via the new
`hasInputFileRelatedSource` helper. This avoids producing a fake
`input_file_name` attribute on non-file leaves and avoids polluting the
common ExprId with the empty string from the BHJ build side.
## Test
Added `input_file_name() with BHJ build-side LocalRelation must return real path`
in `ScalarFunctionsValidateSuite` (backends-velox) covering:
- `fname` column is non-empty for every joined row;
- `fname` contains the real parquet path;
- compared against vanilla Spark.
The existing `test("input_file_name")` (file/Hive scan paths) is unchanged
because those scans are still in the whitelist.
Co-Authored-By: AIME <aime@bytedance.com>
Change-Id: I77c1fa343444488fb8b71deb8dd0b13d587d21551 parent 9202345 commit 52b845f
2 files changed
Lines changed: 68 additions & 3 deletions
File tree
- backends-velox/src/test/scala/org/apache/gluten/functions
- gluten-substrait/src/main/scala/org/apache/gluten/extension/columnar
Lines changed: 44 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1077 | 1077 | | |
1078 | 1078 | | |
1079 | 1079 | | |
| 1080 | + | |
| 1081 | + | |
| 1082 | + | |
| 1083 | + | |
| 1084 | + | |
| 1085 | + | |
| 1086 | + | |
| 1087 | + | |
| 1088 | + | |
| 1089 | + | |
| 1090 | + | |
| 1091 | + | |
| 1092 | + | |
| 1093 | + | |
| 1094 | + | |
| 1095 | + | |
| 1096 | + | |
| 1097 | + | |
| 1098 | + | |
| 1099 | + | |
| 1100 | + | |
| 1101 | + | |
| 1102 | + | |
| 1103 | + | |
| 1104 | + | |
| 1105 | + | |
| 1106 | + | |
| 1107 | + | |
| 1108 | + | |
| 1109 | + | |
| 1110 | + | |
| 1111 | + | |
| 1112 | + | |
| 1113 | + | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
1080 | 1124 | | |
1081 | 1125 | | |
1082 | 1126 | | |
| |||
Lines changed: 24 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
87 | 88 | | |
88 | 89 | | |
89 | 90 | | |
90 | | - | |
| 91 | + | |
| 92 | + | |
91 | 93 | | |
92 | 94 | | |
93 | 95 | | |
| |||
104 | 106 | | |
105 | 107 | | |
106 | 108 | | |
107 | | - | |
| 109 | + | |
108 | 110 | | |
| 111 | + | |
| 112 | + | |
109 | 113 | | |
110 | 114 | | |
111 | 115 | | |
| |||
127 | 131 | | |
128 | 132 | | |
129 | 133 | | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
130 | 151 | | |
131 | 152 | | |
132 | 153 | | |
| |||
0 commit comments