Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change CTE order in EXPLAIN output #30983

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ggevay
Copy link
Contributor

@ggevay ggevay commented Jan 9, 2025

This PR modifies how EXPLAIN prints Let/LetRec: It used to be Return first, and CTEs in bottom-up order, now it's CTEs in top-down order, and then Return. The main argument for the old order was that data flowed upwards both inside CTEs and across CTEs. However, the bottom-up order of CTE bindings kept tripping up optimizer developers, so it's probably better to have the CTEs in binding order (top-down). As @mgree noted, other databases also print in top-down order.

For example:

EXPLAIN
WITH MUTUALLY RECURSIVE
  c0(n int) AS (
    (SELECT n FROM init)
    UNION ALL
    (SELECT * FROM c2)
  ),
  c1(n int) AS (
    SELECT n+n FROM c0 WHERE n<5
  ),
  c2(n int) AS (
    (SELECT * FROM c0)
    UNION ALL
    (SELECT * FROM c1)
    UNION ALL
    (SELECT * FROM c1)
  )
SELECT * FROM c2;

used to be

Explained Query:
  Return
    Get l2
  With Mutually Recursive
    cte l2 =
      Union
        Get l0
        Get l1
        Get l1
    cte l1 =
      Project (#1)
        Filter (#0 < 5)
          Map ((#0 + #0))
            Get l0
    cte l0 =
      Union
        Project (#0)
          ReadStorage materialize.public.init
        Get l2

and now it would be

Explained Query:
  With Mutually Recursive
    cte l0 =
      Union
        Project (#0)
          ReadStorage materialize.public.init
        Get l2
    cte l1 =
      Project (#1)
        Filter (#0 < 5)
          Map ((#0 + #0))
            Get l0
    cte l2 =
      Union
        Get l0
        Get l1
        Get l1
  Return
    Get l2

The first commit is straightforward: just a few lines of code changes, and a bunch of expected test output rewrites (most of it with auto-rewrite, except for testdrive).

The second commit is unfortunately not so straightforward. The problem is that the .spec test framework's input format is supposed to correspond to EXPLAIN's output format, but now we are reversing EXPLAIN's output format. There are several options on how to handle this situation:

  1. Do nothing; the spec input format would keep using the old order. This has the problem that we'd have to keep the old order in mind, and any time when we want to create a new spec test in the future, we'd have to manually reverse the CTEs when copy-pasting an EXPLAIN output to be a spec input.
  2. Make the spec parser be able to handle both orders. This is what I did in this PR. This has the advantage compared to 1. that creating new spec tests by copy-pasting an EXPLAIN output is possible. However, it still has a readability problem (for humans) for existing spec test inputs, as one has to keep in mind that the order might go both ways.
  3. Extend the spec test framework to be able to auto-rewrite inputs, not just outputs. We'd have something like a REWRITE_SPEC_INPUTS env var (similar to REWRITE), which would simply roundtrip spec test inputs through the spec parser and EXPLAIN, and use the result to rewrite the inputs. This would be the cleanest solution from a usability perspective. However, it's not totally straightforward to implement, because the spec framework relies on the (external) datadriven framework, which doesn't provide a facility for rewriting inputs out of the box. We'd probably need to crack open datadriven, and copy-paste some code from it to implement this ourselves. I'm open to doing this. I have a feeling that we won't be able to avoid this in the long run, when we keep making further changes to EXPLAIN output.

Now, 2. vs. 3. is open for debate. This PR just did 2. so far, because it was a matter of minutes to implement, but if we want to commit to the spec test framework for the long-term, then sooner or later we'll have to do 3., so might as well do it now (maybe in an immediate follow-up PR), to avoid keeping around the tech-debt of needing to keep in mind both the old and new orders when reading spec test inputs.

Motivation

  • This PR changes EXPLAIN output, to eliminate a surprising element from it.

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ggevay ggevay force-pushed the explain-binding-order branch 3 times, most recently from 703db87 to b7555d1 Compare January 13, 2025 11:16
@ggevay ggevay force-pushed the explain-binding-order branch from b7555d1 to a864863 Compare January 13, 2025 13:33
Copy link
Contributor Author

@ggevay ggevay Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to change the test in this file manually, because these tests simply expect a clean roundtrip through the spec parser and then EXPLAIN.

Copy link
Contributor Author

@ggevay ggevay Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed a test input in this file manually to test the new order in the spec parser.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changed inputs in this file were not well-formed before the PR. (Which is more evidence that the old order was confusing.)

@ggevay ggevay marked this pull request as ready for review January 13, 2025 14:30
@ggevay ggevay requested a review from a team as a code owner January 13, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant