Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute column statistics min & max for FilterExec and beyond #8155

Open
NGA-TRAN opened this issue Nov 13, 2023 · 2 comments
Open

Compute column statistics min & max for FilterExec and beyond #8155

NGA-TRAN opened this issue Nov 13, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@NGA-TRAN
Copy link
Contributor

Is your feature request related to a problem or challenge?

In IOx we use columns statistics min and max in an optimizer rule however the statistics of column min and max are lost after going through filter (and may be many other operators).

This ticket is to compute the Column statistics in a conservative way.

Describe the solution you'd like

Column min/max of filter will be the same but turns from Exact to InExact.

When #8078 is implemented, The InExact in this case will become Conservative or the like

While working on FilterExec, I will see if I can do the similarly for other operators

Describe alternatives you've considered

No response

Additional context

No response

@NGA-TRAN NGA-TRAN added the enhancement New feature or request label Nov 13, 2023
@NGA-TRAN
Copy link
Contributor Author

@alamb Since this PR https://github.com/apache/arrow-datafusion/pull/8126/files only compute stats for row count and bytes, I will work on this to compute stats of the rest

@NGA-TRAN
Copy link
Contributor Author

This is the first part of #8099

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant