Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DictionaryArray in OVER clause #13153

Merged
merged 4 commits into from
Nov 1, 2024
Merged

Conversation

adriangb
Copy link
Contributor

Fixes #13151

@adriangb
Copy link
Contributor Author

I tested this manually with datafusion-cli. Happy to add a test if needed but maybe it's okay as is.

@github-actions github-actions bot added the optimizer Optimizer rules label Oct 29, 2024
@adriangb
Copy link
Contributor Author

cc @alamb

@findepi
Copy link
Member

findepi commented Oct 29, 2024

It looks like a new & cool capability. Is it possible to add a test?

@adriangb
Copy link
Contributor Author

More like existing functionality that wasn't enabled.

Any suggestions as to where I should add said test?

@alamb
Copy link
Contributor

alamb commented Oct 31, 2024

I tried to produce an error / write some examples in dictionary.slt and I couldn't figure out how to trigger an issue. Here is what I tried

Basically my feeble mind can't figure out if the newly added code can ever be executed.

(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ git diff
diff --git a/datafusion/sqllogictest/test_files/dictionary.slt b/datafusion/sqllogictest/test_files/dictionary.slt
index 176331f57..e802ddfe6 100644
--- a/datafusion/sqllogictest/test_files/dictionary.slt
+++ b/datafusion/sqllogictest/test_files/dictionary.slt
@@ -320,6 +320,41 @@ ORDER BY
 2023-12-20T01:30:00 1000 f1 32.0
 2023-12-20T01:30:00 1000 f2 foo

+# Window Functions
+query TTTT
+SELECT "tag_id", "type",
+  lead("type") OVER (PARTITION BY "tag_id" ORDER BY "time") as "next_type",
+  lag("type") OVER (PARTITION BY "tag_id" ORDER BY "time") as "last_type"
+FROM m2;
+----
+1000 active active NULL
+1000 active active active
+1000 active active active
+1000 active active active
+1000 active active active
+1000 active passive active
+1000 passive passive active
+1000 passive passive passive
+1000 passive passive passive
+1000 passive NULL passive
+
+query TTTT
+SELECT "tag_id", "type",
+  lead("type") OVER (PARTITION BY "tag_id" ORDER BY "time"  RANGE BETWEEN 2 PRECEDING AND CURRENT ROW) as "next_type",
+  lag("type") OVER (PARTITION BY "tag_id" ORDER BY "time") as "last_type"
+FROM m2;
+----
+1000 active active NULL
+1000 active active active
+1000 active active active
+1000 active active active
+1000 active active active
+1000 active passive active
+1000 passive passive active
+1000 passive passive passive
+1000 passive passive passive
+1000 passive NULL passive
+

@alamb
Copy link
Contributor

alamb commented Oct 31, 2024

I am concerned that the lack of testing here would mean we could accidentally break the behavior without knowing it in some future refactor

@alamb alamb marked this pull request as draft October 31, 2024 15:37
@alamb alamb marked this pull request as ready for review October 31, 2024 15:37
@alamb
Copy link
Contributor

alamb commented Oct 31, 2024

Oh, now I see there is an example in #13151 (comment)

Maybe we can just add that example to dictionary.slt and we'll be good?

@adriangb
Copy link
Contributor Author

I have no experience with these slt files but I'll give it a shot!

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Oct 31, 2024
@adriangb
Copy link
Contributor Author

@alamb is 359721f right?

@adriangb
Copy link
Contributor Author

Guess we'll find out soon 😆

@adriangb
Copy link
Contributor Author

@alamb looks like that worked 🚀

@alamb
Copy link
Contributor

alamb commented Nov 1, 2024

@alamb is 359721f right?

BTW I normally run this like cargo test --test sqllogictests -- window --complete and it will fill in the correct results

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @adriangb

@alamb alamb changed the title implement target type selection for range queries on dictionary data types Support DictionaryArray in OVER clause Nov 1, 2024
@alamb alamb merged commit 6c5823e into apache:main Nov 1, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented Nov 1, 2024

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Range queries on Dictionary data types are not implemented
3 participants