Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: move SanityChecker into physical-optimizer crate #14083

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mnpw
Copy link

@mnpw mnpw commented Jan 11, 2025

Which issue does this PR close?

Closes #14072.

Rationale for this change

From #14072

Historically DataFusion was one (very) large crate datafusion, and as it grew bigger we extracted various functionality into separate crates. This leads to both faster compile times (as the crates can be compiled in parallel) as well easier to navigate code (as the crates force a cleaner dependency separation)

What changes are included in this PR?

  • Move SanityChecker from datafusion crate to datafusion-physical-optimizer crate

Are these changes tested?

Ran cargo test

Are there any user-facing changes?

No

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Jan 11, 2025
@mnpw
Copy link
Author

mnpw commented Jan 11, 2025

@alamb for your consideration.

I don't like that datafusion-physical-optimizer needs to use datafusion crate as a dev-dependency. This was required as SanityChecker tests were tightly coupled with the helper functions in datafusion::physical_optimizer::test_utils. Perhaps we can consider moving these helper functions somewhere outside, say, to test-utils crate?

@alamb
Copy link
Contributor

alamb commented Jan 11, 2025

I don't like that datafusion-physical-optimizer needs to use datafusion crate as a dev-dependency.

Thanks @mnpw -- I agree this is not good and we shouldn't merge it in like that (and in fact I think it failed the circular dependency check)

I think one of the major sources of this dependency is the use of ParquetExec which is still defined in the core module...

Since datafusion already depends on phsyical-optimizer, what would you think about moving as many of the physical optimizer utils from

@alamb alamb marked this pull request as draft January 12, 2025 11:28
@alamb
Copy link
Contributor

alamb commented Jan 12, 2025

Converting to draft given the dependency situation we have uncovered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move SanityChecker into datafusion-physical-optimizer crate
2 participants