Change CTE order in EXPLAIN output #30983

ggevay · 2025-01-09T09:19:54Z

This PR modifies how EXPLAIN prints Let/LetRec: It used to be Return first, and CTEs in bottom-up order, now it's CTEs in top-down order, and then Return. The main argument for the old order was that data flowed upwards both inside CTEs and across CTEs. However, the bottom-up order of CTE bindings kept tripping up optimizer developers, so it's probably better to have the CTEs in binding order (top-down). As @mgree noted, other databases also print in top-down order.

For example:

EXPLAIN
WITH MUTUALLY RECURSIVE
  c0(n int) AS (
    (SELECT n FROM init)
    UNION ALL
    (SELECT * FROM c2)
  ),
  c1(n int) AS (
    SELECT n+n FROM c0 WHERE n<5
  ),
  c2(n int) AS (
    (SELECT * FROM c0)
    UNION ALL
    (SELECT * FROM c1)
    UNION ALL
    (SELECT * FROM c1)
  )
SELECT * FROM c2;

used to be

Explained Query:
  Return
    Get l2
  With Mutually Recursive
    cte l2 =
      Union
        Get l0
        Get l1
        Get l1
    cte l1 =
      Project (#1)
        Filter (#0 < 5)
          Map ((#0 + #0))
            Get l0
    cte l0 =
      Union
        Project (#0)
          ReadStorage materialize.public.init
        Get l2

and now it would be

Explained Query:
  With Mutually Recursive
    cte l0 =
      Union
        Project (#0)
          ReadStorage materialize.public.init
        Get l2
    cte l1 =
      Project (#1)
        Filter (#0 < 5)
          Map ((#0 + #0))
            Get l0
    cte l2 =
      Union
        Get l0
        Get l1
        Get l1
  Return
    Get l2

The first commit is straightforward: just a few lines of code changes, and a bunch of expected test output rewrites (most of it with auto-rewrite, except for testdrive).

The second commit is unfortunately not so straightforward. The problem is that the .spec test framework's input format is supposed to correspond to EXPLAIN's output format, but now we are reversing EXPLAIN's output format. There are several options on how to handle this situation:

Do nothing; the spec input format would keep using the old order. This has the problem that we'd have to keep the old order in mind, and any time when we want to create a new spec test in the future, we'd have to manually reverse the CTEs when copy-pasting an EXPLAIN output to be a spec input.
Make the spec parser be able to handle both orders. This is what I did in this PR. This has the advantage compared to 1. that creating new spec tests by copy-pasting an EXPLAIN output is possible. However, it still has a readability problem (for humans) for existing spec test inputs, as one has to keep in mind that the order might go both ways.
Extend the spec test framework to be able to auto-rewrite inputs, not just outputs. We'd have something like a REWRITE_SPEC_INPUTS env var (similar to REWRITE), which would simply roundtrip spec test inputs through the spec parser and EXPLAIN, and use the result to rewrite the inputs. This would be the cleanest solution from a usability perspective. However, it's not totally straightforward to implement, because the spec framework relies on the (external) datadriven framework, which doesn't provide a facility for rewriting inputs out of the box. We'd probably need to crack open datadriven, and copy-paste some code from it to implement this ourselves. I'm open to doing this. I have a feeling that we won't be able to avoid this in the long run, when we keep making further changes to EXPLAIN output.

Now, 2. vs. 3. is open for debate. This PR just did 2. so far, because it was a matter of minutes to implement, but if we want to commit to the spec test framework for the long-term, then sooner or later we'll have to do 3., so might as well do it now (maybe in an immediate follow-up PR), to avoid keeping around the tech-debt of needing to keep in mind both the old and new orders when reading spec test inputs.

Motivation

This PR changes EXPLAIN output, to eliminate a surprising element from it.

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

ggevay · 2025-01-13T14:08:01Z

src/expr-parser/tests/test_mir_parser/with_ctes.spec

Had to change the test in this file manually, because these tests simply expect a clean roundtrip through the spec parser and then EXPLAIN.

ggevay · 2025-01-13T14:08:42Z

src/transform/tests/test_transforms/anf.spec

Changed a test input in this file manually to test the new order in the spec parser.

ggevay · 2025-01-13T14:09:48Z

src/transform/tests/test_transforms/typecheck.spec

The changed inputs in this file were not well-formed before the PR. (Which is more evidence that the old order was confusing.)

ggevay force-pushed the explain-binding-order branch 3 times, most recently from 703db87 to b7555d1 Compare January 13, 2025 11:16

ggevay added 2 commits January 13, 2025 14:33

Reverse CTE order in EXPLAIN output

218470d

Make spec parser be able to handle both cte orders

a864863

ggevay force-pushed the explain-binding-order branch from b7555d1 to a864863 Compare January 13, 2025 13:33

ggevay commented Jan 13, 2025

View reviewed changes

ggevay marked this pull request as ready for review January 13, 2025 14:30

ggevay requested a review from a team as a code owner January 13, 2025 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change CTE order in EXPLAIN output #30983

Change CTE order in EXPLAIN output #30983

ggevay commented Jan 9, 2025 •

edited

Loading

ggevay Jan 13, 2025 •

edited

Loading

ggevay Jan 13, 2025 •

edited

Loading

ggevay Jan 13, 2025

Change CTE order in EXPLAIN output #30983

Are you sure you want to change the base?

Change CTE order in EXPLAIN output #30983

Conversation

ggevay commented Jan 9, 2025 • edited Loading

Motivation

Tips for reviewer

Checklist

ggevay Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

ggevay Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

ggevay Jan 13, 2025

Choose a reason for hiding this comment

ggevay commented Jan 9, 2025 •

edited

Loading

ggevay Jan 13, 2025 •

edited

Loading

ggevay Jan 13, 2025 •

edited

Loading