-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In the greedy query planner, compute only O(n²)
trees instead of O(n³)
#1722
Conversation
Signed-off-by: Johannes Kalmbach <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1722 +/- ##
=======================================
Coverage 89.86% 89.86%
=======================================
Files 389 389
Lines 37308 37334 +26
Branches 4204 4206 +2
=======================================
+ Hits 33527 33552 +25
+ Misses 2485 2484 -1
- Partials 1296 1298 +2 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Johannes Kalmbach <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1-1 with Johannes, thanks a lot for the fix. We discussed a few changes
Signed-off-by: Johannes Kalmbach <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another quick round
Signed-off-by: Johannes Kalmbach <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, I will test this end to end, write a description, and then merge this
Conformance check passed ✅No test result changes. |
Quality Gate passedIssues Measures |
O(n³)
O(n³)
O(n²)
instead of O(n³)
O(n²)
instead of O(n³)
O(n²)
trees instead of O(n³)
First, a quick recap: For a connected component of
n
triples, the greedy query planner maintains a set of disjoint query execution trees. The initial set aren
simple query execution trees, each consisting of one index scan. In each step, the cheapest possible join of two trees that are not yet connected, is computed. The worst case is when each pair of triples is connected, so that the induced triple graph is a clique.In the implementation so far, a possible join is computed and its cost estimated (which involved constructing the respective tree) potentially many times, namely in each step until it is picked or no longer possible. Now, each possible join (and the respective query execution tree) is computed and evaluated exactly once and then discarded when it has been picked or is no longer possible. Since each step except the first adds exactly one tree (and throws out two trees, namely those that were joined), only a linear number of new trees have to be constructed in each step. In the code so far, this was a quadratic number in the worst case. This cost significant time for large connected components.
NOTE: The cheapest possible join in each step is still computed by a linear scan over all candidates, which in the worst case is a quadratic number in the first half of the steps. This could be improved by using a priority queues and other tricks. However, we would need to evaluate first whether these minimum computations actually cost significant time. In the code so far, the expensive part was the construction of the trees.