Commit 4aed81a
fix: preserve duplicate GROUPING SETS rows (#21058)
## Which issue does this PR close?
- Closes #21316.
## Rationale for this change
`GROUPING SETS` with duplicate grouping lists were incorrectly collapsed
during execution. The internal grouping id only encoded the semantic
null mask, so repeated grouping sets shared the same execution key and
were merged, which caused rows to be lost compared with PostgreSQL
behavior.
For example, with:
```sql
create table duplicate_grouping_sets(deptno int, job varchar, sal int, comm int);
insert into duplicate_grouping_sets values
(10, 'CLERK', 1300, null),
(20, 'MANAGER', 3000, null);
select deptno, job, sal, sum(comm), grouping(deptno), grouping(job), grouping(sal)
from duplicate_grouping_sets
group by grouping sets ((deptno, job), (deptno, sal), (deptno, job))
order by deptno, job, sal, grouping(deptno), grouping(job), grouping(sal);
```
PostgreSQL preserves the duplicate grouping set and returns:
```text
deptno | job | sal | sum | grouping | grouping | grouping
--------+---------+------+-----+----------+----------+----------
10 | CLERK | | | 0 | 0 | 1
10 | CLERK | | | 0 | 0 | 1
10 | | 1300 | | 0 | 1 | 0
20 | MANAGER | | | 0 | 0 | 1
20 | MANAGER | | | 0 | 0 | 1
20 | | 3000 | | 0 | 1 | 0
(6 rows)
```
Before this fix, DataFusion collapsed the duplicate `(deptno, job)`
grouping set and returned only 4 rows for the same query shape.
```text
+--------+---------+------+-----------------------------------+------------------------------------------+---------------------------------------+---------------------------------------+
| deptno | job | sal | sum(duplicate_grouping_sets.comm) | grouping(duplicate_grouping_sets.deptno) | grouping(duplicate_grouping_sets.job) | grouping(duplicate_grouping_sets.sal) |
+--------+---------+------+-----------------------------------+------------------------------------------+---------------------------------------+---------------------------------------+
| 10 | CLERK | NULL | NULL | 0 | 0 | 1 |
| 10 | NULL | 1300 | NULL | 0 | 1 | 0 |
| 20 | MANAGER | NULL | NULL | 0 | 0 | 1 |
| 20 | NULL | 3000 | NULL | 0 | 1 | 0 |
+--------+---------+------+-----------------------------------+------------------------------------------+---------------------------------------+---------------------------------------+
```
## What changes are included in this PR?
- Preserve duplicate grouping sets by packing a duplicate ordinal into
the high bits of `__grouping_id`, so repeated occurrences of the same
grouping set pattern produce distinct execution keys.
- `GROUPING()` now reads the actual `__grouping_id` column type directly
from the schema (via `Aggregate::grouping_id_type` rather than inferring
bit width from the count of grouping expressions alone. This ensures
bitmask literals are correctly sized when duplicate-ordinal bits widen
the column type beyond what the expression count would imply.
- `GROUPING()` masks off the ordinal bits before returning the result,
so the duplicate-ordinal encoding is invisible to user-facing SQL and
semantics remain unchanged.
- Add regression coverage for the duplicate `GROUPING SETS` case in:
- `datafusion/core/tests/sql/aggregates/basic.rs`
- `datafusion/sqllogictest/test_files/group_by.slt`
## Are these changes tested?
- `cargo fmt --all`
- `cargo test -p datafusion duplicate_grouping_sets_are_preserved`
- `cargo test -p datafusion-physical-plan
grouping_sets_preserve_duplicate_groups`
- `cargo test -p datafusion-physical-plan
evaluate_group_by_supports_duplicate_grouping_sets_with_eight_columns`
- PostgreSQL validation against the same query/result shape
## Are there any user-facing changes?
- Yes. Queries that contain duplicate `GROUPING SETS` entries now return
the correct duplicated result rows, matching PostgreSQL behavior.
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>1 parent 91c2e04 commit 4aed81a
File tree
4 files changed
+212
-50
lines changed- datafusion
- expr/src/logical_plan
- optimizer/src/analyzer
- physical-plan/src/aggregates
- sqllogictest/test_files
4 files changed
+212
-50
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
48 | | - | |
| 48 | + | |
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| |||
3595 | 3595 | | |
3596 | 3596 | | |
3597 | 3597 | | |
| 3598 | + | |
3598 | 3599 | | |
3599 | 3600 | | |
3600 | 3601 | | |
3601 | 3602 | | |
3602 | | - | |
| 3603 | + | |
3603 | 3604 | | |
3604 | 3605 | | |
3605 | 3606 | | |
| |||
3685 | 3686 | | |
3686 | 3687 | | |
3687 | 3688 | | |
3688 | | - | |
3689 | | - | |
3690 | | - | |
3691 | | - | |
3692 | | - | |
| 3689 | + | |
| 3690 | + | |
| 3691 | + | |
| 3692 | + | |
| 3693 | + | |
| 3694 | + | |
| 3695 | + | |
| 3696 | + | |
| 3697 | + | |
| 3698 | + | |
| 3699 | + | |
| 3700 | + | |
| 3701 | + | |
| 3702 | + | |
3693 | 3703 | | |
3694 | | - | |
| 3704 | + | |
3695 | 3705 | | |
3696 | | - | |
| 3706 | + | |
3697 | 3707 | | |
3698 | 3708 | | |
3699 | 3709 | | |
| |||
3702 | 3712 | | |
3703 | 3713 | | |
3704 | 3714 | | |
3705 | | - | |
3706 | | - | |
3707 | | - | |
3708 | | - | |
| 3715 | + | |
| 3716 | + | |
| 3717 | + | |
| 3718 | + | |
| 3719 | + | |
| 3720 | + | |
| 3721 | + | |
| 3722 | + | |
| 3723 | + | |
| 3724 | + | |
| 3725 | + | |
| 3726 | + | |
| 3727 | + | |
| 3728 | + | |
| 3729 | + | |
3709 | 3730 | | |
3710 | | - | |
3711 | | - | |
| 3731 | + | |
| 3732 | + | |
3712 | 3733 | | |
3713 | 3734 | | |
3714 | 3735 | | |
3715 | 3736 | | |
3716 | 3737 | | |
3717 | | - | |
3718 | | - | |
3719 | | - | |
| 3738 | + | |
| 3739 | + | |
| 3740 | + | |
| 3741 | + | |
| 3742 | + | |
| 3743 | + | |
| 3744 | + | |
3720 | 3745 | | |
3721 | 3746 | | |
3722 | 3747 | | |
| |||
3737 | 3762 | | |
3738 | 3763 | | |
3739 | 3764 | | |
| 3765 | + | |
| 3766 | + | |
| 3767 | + | |
| 3768 | + | |
| 3769 | + | |
| 3770 | + | |
| 3771 | + | |
| 3772 | + | |
| 3773 | + | |
| 3774 | + | |
| 3775 | + | |
| 3776 | + | |
| 3777 | + | |
| 3778 | + | |
| 3779 | + | |
| 3780 | + | |
| 3781 | + | |
| 3782 | + | |
3740 | 3783 | | |
3741 | 3784 | | |
3742 | 3785 | | |
| |||
5053 | 5096 | | |
5054 | 5097 | | |
5055 | 5098 | | |
| 5099 | + | |
| 5100 | + | |
| 5101 | + | |
| 5102 | + | |
| 5103 | + | |
| 5104 | + | |
| 5105 | + | |
| 5106 | + | |
5056 | 5107 | | |
5057 | 5108 | | |
5058 | 5109 | | |
| |||
Lines changed: 30 additions & 19 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
102 | 109 | | |
103 | 110 | | |
104 | 111 | | |
105 | | - | |
| 112 | + | |
106 | 113 | | |
107 | 114 | | |
108 | 115 | | |
| |||
184 | 191 | | |
185 | 192 | | |
186 | 193 | | |
187 | | - | |
| 194 | + | |
| 195 | + | |
188 | 196 | | |
189 | 197 | | |
190 | 198 | | |
191 | 199 | | |
192 | 200 | | |
193 | 201 | | |
194 | | - | |
| 202 | + | |
195 | 203 | | |
196 | | - | |
197 | | - | |
198 | | - | |
199 | | - | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | | - | |
207 | | - | |
208 | | - | |
209 | 204 | | |
210 | 205 | | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
211 | 217 | | |
212 | | - | |
213 | | - | |
| 218 | + | |
214 | 219 | | |
215 | 220 | | |
216 | 221 | | |
217 | 222 | | |
218 | 223 | | |
219 | 224 | | |
220 | | - | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
221 | 232 | | |
222 | 233 | | |
223 | 234 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| |||
396 | 396 | | |
397 | 397 | | |
398 | 398 | | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
399 | 408 | | |
400 | 409 | | |
401 | 410 | | |
| |||
420 | 429 | | |
421 | 430 | | |
422 | 431 | | |
423 | | - | |
| 432 | + | |
424 | 433 | | |
425 | 434 | | |
426 | 435 | | |
| |||
2039 | 2048 | | |
2040 | 2049 | | |
2041 | 2050 | | |
2042 | | - | |
2043 | | - | |
| 2051 | + | |
| 2052 | + | |
| 2053 | + | |
| 2054 | + | |
| 2055 | + | |
| 2056 | + | |
| 2057 | + | |
| 2058 | + | |
| 2059 | + | |
| 2060 | + | |
| 2061 | + | |
| 2062 | + | |
| 2063 | + | |
| 2064 | + | |
| 2065 | + | |
| 2066 | + | |
| 2067 | + | |
| 2068 | + | |
| 2069 | + | |
| 2070 | + | |
| 2071 | + | |
| 2072 | + | |
| 2073 | + | |
2044 | 2074 | | |
2045 | 2075 | | |
2046 | 2076 | | |
2047 | 2077 | | |
2048 | | - | |
| 2078 | + | |
| 2079 | + | |
| 2080 | + | |
| 2081 | + | |
| 2082 | + | |
| 2083 | + | |
| 2084 | + | |
| 2085 | + | |
| 2086 | + | |
2049 | 2087 | | |
2050 | 2088 | | |
| 2089 | + | |
2051 | 2090 | | |
2052 | | - | |
2053 | | - | |
2054 | | - | |
2055 | | - | |
2056 | | - | |
2057 | | - | |
| 2091 | + | |
| 2092 | + | |
| 2093 | + | |
| 2094 | + | |
| 2095 | + | |
| 2096 | + | |
2058 | 2097 | | |
2059 | | - | |
| 2098 | + | |
2060 | 2099 | | |
2061 | 2100 | | |
2062 | 2101 | | |
| 2102 | + | |
| 2103 | + | |
| 2104 | + | |
| 2105 | + | |
| 2106 | + | |
| 2107 | + | |
| 2108 | + | |
| 2109 | + | |
| 2110 | + | |
| 2111 | + | |
| 2112 | + | |
| 2113 | + | |
| 2114 | + | |
| 2115 | + | |
| 2116 | + | |
2063 | 2117 | | |
2064 | 2118 | | |
2065 | 2119 | | |
| |||
2074 | 2128 | | |
2075 | 2129 | | |
2076 | 2130 | | |
| 2131 | + | |
| 2132 | + | |
2077 | 2133 | | |
2078 | 2134 | | |
2079 | 2135 | | |
| |||
2087 | 2143 | | |
2088 | 2144 | | |
2089 | 2145 | | |
| 2146 | + | |
| 2147 | + | |
| 2148 | + | |
| 2149 | + | |
2090 | 2150 | | |
2091 | 2151 | | |
2092 | 2152 | | |
| |||
2096 | 2156 | | |
2097 | 2157 | | |
2098 | 2158 | | |
2099 | | - | |
| 2159 | + | |
| 2160 | + | |
| 2161 | + | |
| 2162 | + | |
| 2163 | + | |
| 2164 | + | |
2100 | 2165 | | |
2101 | 2166 | | |
2102 | 2167 | | |
| |||
0 commit comments