Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust Statistics::total_byte_size in Filter with a projection #13224

Open
alamb opened this issue Nov 1, 2024 · 0 comments
Open

Adjust Statistics::total_byte_size in Filter with a projection #13224

alamb opened this issue Nov 1, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Nov 1, 2024

Is your feature request related to a problem or challenge?

@Dandandan pointed out in https://github.com/apache/datafusion/pull/13187/files#r1824330274 that when applying a projection to a filter the resulting StatisticsCalculation is not properly updated

I think the global stats (total_byte_size) are not correct either, doesn't take into account the reduced number of columns. It should do something similar as stats_projection for ProjectionExec

However, I did not want to try and add that in the bugfix PR #13187 because:

  • The total_byte_size calculation in filter also needs to take estimated selectivity into account
  • The calculation of total_byte_size in stats_projection is also somewhat suspect as it only accounts for "fixed sized" rows but still claims the size is known precisely

Describe the solution you'd like

Account for the projection in the filter statistics calculation somehow

Describe alternatives you've considered

https://github.com/apache/datafusion/blob/ac79ef3442e65f6197c7234da9fad964895b9101/datafusion/physical-plan/src/projection.rs#L261-L260

Additional context

No response

@alamb alamb added the enhancement New feature or request label Nov 1, 2024
@jiashenC jiashenC removed their assignment Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants