fix: any and all functions should not have default frame or order_by#11930
fix: any and all functions should not have default frame or order_by#11930ssabdb wants to merge 5 commits intoibis-project:mainfrom
Conversation
|
Since the change is to the base sql compiler |
Almost, I think that file is just for testing the base Sql compiler, when ideally we would actually want to test this for each backend. So, it should go in @pytest.mark.parametrize("backend_name", _get_backends_to_test())
@pytest.mark.notimpl(["polars"], raises=ValueError, reason="not a SQL backend")
def test_window_filter_any(backend_name, snapshot):
t = ibis.table([("a", "string"), ("x", "int64")], name="t")
expr = t.filter((_.a == "hello").any().over(group_by=_.x))
sql = ibis.to_sql(expr, dialect=backend_name)
snapshot.assert_match(sql, "out.sql")Second, can you actually explain what the exclude_unsupported_window_frame_from_ops rewrite rule DOES? I am decently confident in this change because the tests pass, but I don't have a good grip on what the other side-effects of this change would be. If you understand, it would be great if you could add a comment to the rewrite func |
NickCrews
left a comment
There was a problem hiding this comment.
See my above comment, move the test to check all backends, and ideally explain the rewrite rule
|
Many thanks for the comments. I'm also confused by what some of these rewrites functions are trying to do with window functions and on moving the tests noticed that some backends, e.g. mysql were still creating a window function with an order by null. I think The function My conclusion is that these functions need a small amount of rationalization, probably to separate one to zero index transformation - there seem to be several of these - from forcing default order_by which feels like it should be consistent across all backends? I've pushed some (not ready for merging) changes which tries to work on the rewrites only. I suspect it will break a few things. |
Description of changes
The any and all operators should not have a default ordering or set a restricted 'current row' frame by default, because they are not operators which by default should be ordered.
I noticed a bug in bigquery where filtering e.g.
table.filter((.a == "hello").any().over(group_by=(.x)))produces an incorrect window clause, including
(PARTITION BYt0.xorder by null asc ROWS between unbounded preceding and current row)which leads to inconsistent and incorrect results.
The expected behaviour is to produce no frame or order by, e.g.
(PARTITION BYt0.xROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)This is because ordering by null and the current row results in a undefined order. I suspect this will probably apply to multiple backends.
The fix involves modifying
exclude_unsupported_window_frame_from_opsinrewrites.pyto excludeAnyandAll.Closes #11931