`REFRESH` options for MVs -- Compute #23819

ggevay · 2023-12-11T20:57:06Z

This is the Compute part of the first part of the REFRESH options for materialized views epic. It implements all the refresh options mentioned in the design doc, but it doesn't implement automatic replica management yet.

@jkosh44, could you please review the changes in interval.rs?

@teskje, could you please review sequence_create_materialized_view + the Compute parts + refresh_schedule.rs?

Note that in sequence_create_materialized_view I moved the timestamp selection outside of the catalog_transact_with_side_effects, as discussed with Matt here.

Motivation

This PR adds a known-desirable feature: https://github.com/MaterializeInc/database-issues/issues/6905

Temporary Limitations

No automated replica management yet, as noted above.

If there is an index on the materialized view, then we have a problem after envd restarts, because the index's since will be bootstrapped to be 1 sec before the next refresh, which is most likely in the future, making queries on the MV block until the next refresh. We should enhance the frontier selection in the bootstrapping at some point, but for now I'll just tell the customer to please don't create indexes on these MVs.

This first version simply relies on Adapter read holds to keep the sinces close to the present moment. Eventually, we'll want to have custom compaction windows on REFRESH EVERY MVs (and their indexes), because relying on Adapter read holds precludes SERIALIZABLE reads from REFRESH EVERY MVs selecting somewhat older timestamps, making SERIALIZABLE reads not faster than STRICT SERIALIZABLE reads.

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
- (First Nightly run for an old version of the PR: https://buildkite.com/materialize/nightlies/builds/5569)
- (Second Nightly run for an old version of the PR: https://buildkite.com/materialize/nightlies/builds/5588 I think here only the "Legacy upgrade tests" failure is related to this PR.)
- Nightly run: https://buildkite.com/materialize/nightlies/builds/5853
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:

shepherdlybot · 2023-12-12T12:20:19Z

Mitigations

Completing required mitigations increases Resilience Coverage.

(Required) Code Review 🔍 Detected
(Required) Feature Flag
(Required) Integration Test 🔍 Detected
(Required) Observability
(Required) QA Review 🔍 Detected
Unit Test 🔍 Detected

Risk Summary:

The pull request poses a high risk with a score of 83, driven by predictors such as the average line count and executable lines within files, which historically are associated with a 138% higher likelihood of introducing bugs compared to the repository baseline. The repository has seen a decreasing trend in bugs, but with recent spikes in bug fixes, and a similar pattern is predicted with spikes in riskier pull requests. Additionally, the changes involve 3 files that are known hotspots for bugs.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File	Percentile
../src/coord.rs	98
../sequencer/inner.rs	97
../coord/command_handler.rs	92
../statement/ddl.rs	97

src/sql-parser/src/parser.rs

src/sql/src/names.rs

benesch · 2023-12-12T14:09:28Z

Exciting that you've gotten this working end to end, Gabor! This PR is at a size that's really hard to handle in GitHub reviews, though. Would recommend trying to split out the parsing, planning, and actual implementation into separate PRs. The parser stuff looks basically ready to go, for example, and could merge basically ASAP (just modulo the comments I dropped above). Totally okay for a parser-only PR to immediately error in the planner with "REFRESH not yet supported", and then for the next plannning-only PR to immediately error in the coordinator with "REFRESH not yet supported".

ggevay · 2023-12-12T15:23:23Z

Ok, I'll split this into 3 PRs.

Thanks for the comments!

philip-stoev · 2023-12-13T13:20:07Z

@ggevay is this ready for testing at this time?

ggevay · 2023-12-13T13:34:02Z

@philip-stoev, not yet. Unfortunately, there are several issues with it at the moment, which I'm fixing now, but I got a bit sick, so it's going slowly.

ggevay · 2023-12-13T13:37:40Z

Status: There are several minor issues that should be easy to fix, but there is one serious panic in slts. I figured out what's causing the panic with Matt's help, but it's not entirely clear how to fix it. Working on it.

def-

I found 2 separate panics with my one-off draft that hackily converts all MVs to REFRESH EVERY '2 seconds':

thread 'timely:work-0' panicked at /cargo/git/checkouts/timely-dataflow-70b80d81d6cabd62/de20aa8/timely/src/dataflow/operators/capability.rs:139:13: Attempted to downgrade Capability { time: 1704332460774, internal: "..." } to 774, which is not beyond the capability's time.

and

thread 'coordinator' panicked at src/sql/src/plan/statement/ddl.rs:2148:29: ALIGNED TO should have been filled in by purification

I'm guessing those are both unexpected. The non-matching rows mostly looked like they are not wrong results, but I'm not sure about the controller-frontiers:

controller-frontiers.td:305:1: error: non-matching rows: expected:
[["0", "<null>"]]
got:
[["751", "<null>"]]
Poor diff:
- 0 <null>
+ 751 <null>

Test results are in https://buildkite.com/materialize/tests/builds/72256 and https://buildkite.com/materialize/nightlies/builds/5842

ggevay · 2024-01-04T12:22:01Z

I found 2 separate panics with my #24210 that hackily converts all MVs to REFRESH EVERY '2 seconds':

@def- thank you very much for this test!

I've fixed both panics:

The Attempted to downgrade Capability panic was fortunately already fixed yesterday evening (caught by @antiguru from looking at the code) by 56d8e12
The ALIGNED TO panic was caused by forgetting to purify EXPLAIN CREATE MATERIALIZED VIEW, fixed by 4e9f070

I pushed the fixes to your PR, and running CI again. Hopefully, there will be no panics now.

The non-matching rows might be ok, but I'll definitely investigate them.

def-

Is it expected that the REFRESH option can be repeated? Only the last will have an effect, but the first one is still evaluated, so this panics:

CREATE SOURCE counter        
  FROM LOAD GENERATOR COUNTER
  (TICK INTERVAL '1s');      

CREATE MATERIALIZED VIEW mv8 WITH (REFRESH AT mz_unsafe.mz_panic('heh'), REFRESH EVERY '1 second') AS SELECT * FROM counter;

Here I'm only seeing the refresh every 10 seconds, not at 2 seconds after the current time:

CREATE MATERIALIZED VIEW mv7 WITH (REFRESH AT mz_now()::text::int8 + 2000, REFRESH EVERY '10 seconds') AS SELECT * FROM counter;     COPY (SUBSCRIBE (SELECT * FROM mv8)) TO STDOUT;

I'm not sure if this is how we handle options in general or if we want to block this off since someone could accidentally use it and expect both refreshes to have an effect.

ggevay · 2024-01-04T13:09:59Z

Is it expected that the REFRESH option can be repeated? Only the last will have an effect, but the first one is still evaluated, so this panics:

@def- Definitely both refresh options should take effect! I'll investigate.

ggevay · 2024-01-04T13:22:11Z

@def-

The first one is expected to panic, because both REFRESH options are evaluated, so the mz_unsafe.mz_panic will be called.

The second one creates mv7, but then reads from mv8. If I make the SUBSCRIBE read from mv7, then the refresh that is 2 seconds after the creation does happen for me. However, there is actually a bug here: I treated the last REFRESH AT to be the last refresh ever, so the SUBSCRIBE terminates, when actually it should go on indefinitely. I'll fix this in a moment.

Edit: I pushed a fix: 6c1f168

src/adapter/src/coord.rs

src/adapter/src/coord/sequencer/inner/create_materialized_view.rs

teskje

I still have some questions about what setting until actually guarantees about the times that arrive at a dataflow output, and I'd like to see a test for the bootstrap as-of selection. But given that this will be behind a feature flag and disabled by default, I'm fine with merging this version.

src/compute/src/sink/persist_sink.rs

def-

Thanks, not seeing any panics anymore.

The non-matching rows might be ok, but I'll definitely investigate them.

I took a quick look based on the lastest runs in https://buildkite.com/materialize/tests/builds/72270 and https://buildkite.com/materialize/nightlies/builds/5848
These look worrying:

controller-frontiers.td:305:1: error: non-matching rows: expected:
[["0", "<null>"]]
got:
[["603", "<null>"]]
Poor diff:
- 0 <null>
+ 603 <null>

kafka-avro-upsert-sinks.td:186:1: error: missing record 0: Record {
    headers: [
        "1",
    ],
    key: Some(
        {
            (
                "a",
                Long(1),
            ),
        },
    ),
    value: Some(
        {
            (
                "a",
                Long(1),
            ),
            (
                "b",
                Long(1),
            ),
        },
    ),
}

temporal.td:18:1: error: executing query failed: db error: ERROR: Timestamp (0) is not valid for all inputs: [Antichain { elements: [700] }]: ERROR: Timestamp (0) is not valid for all inputs: [Antichain { elements: [700] }]
     |
  17 |
  18 | > SELECT * FROM one_bound1 AS OF 0
     | ^

> SELECT * FROM foo;
rows didn't match; sleeping to see if dataflow catches up 50ms 75ms 113ms 169ms 253ms 380ms 570ms 854ms 1s 2s 3s 4s 6s 10s 15s 22s 33s 22s
^^^ +++
timestamps-debezium-kafka.td:160:1: error: non-matching rows: expected:
[["1", "1"], ["2", "2"]]
got:
[]
Poor diff:
- 1 1
- 2 2

> SELECT
  frontiers.read_frontier,
  frontiers.write_frontier
FROM mz_internal.mz_frontiers frontiers
JOIN mz_materialized_views mvs
  ON frontiers.object_id = mvs.id
WHERE
  mvs.name = 'mv2'
rows didn't match; sleeping to see if dataflow catches up 50ms 75ms 113ms 169ms 253ms 380ms 570ms 854ms 1s 2s 3s 4s 6s 10s 15s 22s 33s 22s
^^^ +++
controller-frontiers.td:305:1: error: non-matching rows: expected:
[["0", "<null>"]]
got:
[["6", "<null>"]]
Poor diff:
- 0 <null>
+ 6 <null>

ggevay · 2024-01-04T15:34:32Z

@def-

These look worrying:

Thanks, I'll check them!

Btw. we just now did some changes with @antiguru and @teskje, so I'll rebase the testing PR now to re-run. Saving the links to the old run that your above comment refers to:

ggevay · 2024-01-04T15:38:15Z

@def- The non-matching rows where mz_internal.mz_frontiers is involved is expected, because REFRESH options modify how the frontiers move. So the non-matching rows in controller-frontiers.td can be ignored.

ggevay · 2024-01-04T15:45:12Z

@def- The

temporal.td:18:1: error: executing query failed: db error: ERROR: Timestamp (0) is not valid for all inputs:

is also expected: REFRESH options modify what times are valid to read from: originally 0 was valid because this MV has only constant inputs, but with the REFRESH EVERY the first valid read time will be the time of the first refresh, which will be mz_now() by default.

ggevay · 2024-01-04T15:52:25Z

@def-
kafka-avro-upsert-sinks.td: The non-matching rows here are also expected: The distribution of data among those 3 kafka-verify-data calls is based on data timestamps in the MV, which is modified by the REFRESH options (times are rounded up to the nearest refresh).

ggevay · 2024-01-04T16:06:19Z

@def-
timestamps-debezium-kafka.td: This one is tricky, but also expected: This test is manually controlling the progress of the Kafka topics. A REFRESH EVERY mat view is reading from such a topic, and it gets stuck, because the topic never makes enough progress to reach the time of the first refresh. (Td waiting for the rows to match up is not helping, because wall-clock time is not helping the progress of the Kafka topic.)

ggevay · 2024-01-04T16:07:14Z

@def-
I went through all the "These look worrying:". Thanks for flagging them!

antiguru

I think this is fine as-is. There are several places that could panic, but all seem they could only be triggered when something upstreams fails, which isn't ideal, but something we could revisit later.

I did not review adapter and testing changes.

ggevay · 2024-01-04T16:43:51Z

Thanks @antiguru!

I'm making some cosmetic changes, and then run Nightly and then merge it.

We'll address @def-'s coverage concerns in a follow-up PR that will add lots of new tests.

ggevay · 2024-01-04T19:17:00Z

Merged! Thank you very much for all the discussions and reviews!

ggevay force-pushed the refresh-every branch 5 times, most recently from 32289ce to f421c70 Compare December 12, 2023 12:15

ggevay marked this pull request as ready for review December 12, 2023 12:19

ggevay requested a review from a team as a code owner December 12, 2023 12:19

ggevay requested review from a team December 12, 2023 12:19

ggevay requested a review from benesch as a code owner December 12, 2023 12:19

ggevay requested review from maddyblue, jkosh44 and teskje and removed request for maddyblue December 12, 2023 12:19

nrainer-materialize requested a review from a team December 12, 2023 13:08

ggevay added A-compute Area: compute A-ADAPTER Topics related to the ADAPTER layer labels Dec 12, 2023

ggevay force-pushed the refresh-every branch from f421c70 to 7baed08 Compare December 12, 2023 13:58

benesch reviewed Dec 12, 2023

View reviewed changes

src/sql-parser/src/parser.rs Outdated Show resolved Hide resolved

benesch reviewed Dec 12, 2023

View reviewed changes

src/sql/src/names.rs Outdated Show resolved Hide resolved

ggevay marked this pull request as draft December 12, 2023 15:23

ggevay force-pushed the refresh-every branch 2 times, most recently from 1c5a0e5 to dee4de6 Compare December 13, 2023 17:58

ggevay force-pushed the refresh-every branch 4 times, most recently from 912ebd7 to 991fdbb Compare January 3, 2024 20:33

def- requested changes Jan 4, 2024

View reviewed changes

def- reviewed Jan 4, 2024

View reviewed changes

ggevay marked this pull request as ready for review January 4, 2024 13:05

teskje reviewed Jan 4, 2024

View reviewed changes

src/adapter/src/coord.rs Outdated Show resolved Hide resolved

teskje reviewed Jan 4, 2024

View reviewed changes

src/adapter/src/coord/sequencer/inner/create_materialized_view.rs Outdated Show resolved Hide resolved

teskje approved these changes Jan 4, 2024

View reviewed changes

src/compute/src/sink/persist_sink.rs Outdated Show resolved Hide resolved

src/compute/src/sink/persist_sink.rs Outdated Show resolved Hide resolved

def- requested changes Jan 4, 2024

View reviewed changes

antiguru approved these changes Jan 4, 2024

View reviewed changes

ggevay force-pushed the refresh-every branch from 99e64b4 to ddaaa91 Compare January 4, 2024 16:58

def- approved these changes Jan 4, 2024

View reviewed changes

Add REFRESH options for MVs -- Compute

a8fc4b0

ggevay force-pushed the refresh-every branch from ddaaa91 to a8fc4b0 Compare January 4, 2024 17:04

ggevay merged commit e62c192 into MaterializeInc:main Jan 4, 2024
145 of 155 checks passed

ggevay mentioned this pull request Jan 7, 2024

Additional tests for REFRESH option in MVs #24219

Merged

5 tasks

ggevay mentioned this pull request Jan 18, 2024

cluster tests: Use REFRESH EVERY with blue/green deployment #24518

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`REFRESH` options for MVs -- Compute #23819

`REFRESH` options for MVs -- Compute #23819

ggevay commented Dec 11, 2023 •

edited

Loading

shepherdlybot bot commented Dec 12, 2023 •

edited

Loading

benesch commented Dec 12, 2023

ggevay commented Dec 12, 2023

philip-stoev commented Dec 13, 2023

ggevay commented Dec 13, 2023

ggevay commented Dec 13, 2023

def- left a comment •

edited

Loading

ggevay commented Jan 4, 2024

def- left a comment

ggevay commented Jan 4, 2024

ggevay commented Jan 4, 2024 •

edited

Loading

teskje left a comment

def- left a comment •

edited

Loading

ggevay commented Jan 4, 2024

ggevay commented Jan 4, 2024 •

edited

Loading

ggevay commented Jan 4, 2024

ggevay commented Jan 4, 2024

ggevay commented Jan 4, 2024

ggevay commented Jan 4, 2024

antiguru left a comment

ggevay commented Jan 4, 2024

ggevay commented Jan 4, 2024

REFRESH options for MVs -- Compute #23819

REFRESH options for MVs -- Compute #23819

Conversation

ggevay commented Dec 11, 2023 • edited Loading

Motivation

Temporary Limitations

Tips for reviewer

Checklist

shepherdlybot bot commented Dec 12, 2023 • edited Loading

Mitigations

benesch commented Dec 12, 2023

ggevay commented Dec 12, 2023

philip-stoev commented Dec 13, 2023

ggevay commented Dec 13, 2023

ggevay commented Dec 13, 2023

def- left a comment • edited Loading

Choose a reason for hiding this comment

ggevay commented Jan 4, 2024

def- left a comment

Choose a reason for hiding this comment

ggevay commented Jan 4, 2024

ggevay commented Jan 4, 2024 • edited Loading

teskje left a comment

Choose a reason for hiding this comment

def- left a comment • edited Loading

Choose a reason for hiding this comment

ggevay commented Jan 4, 2024

ggevay commented Jan 4, 2024 • edited Loading

ggevay commented Jan 4, 2024

ggevay commented Jan 4, 2024

ggevay commented Jan 4, 2024

ggevay commented Jan 4, 2024

antiguru left a comment

Choose a reason for hiding this comment

ggevay commented Jan 4, 2024

ggevay commented Jan 4, 2024

`REFRESH` options for MVs -- Compute #23819

`REFRESH` options for MVs -- Compute #23819

ggevay commented Dec 11, 2023 •

edited

Loading

shepherdlybot bot commented Dec 12, 2023 •

edited

Loading

def- left a comment •

edited

Loading

ggevay commented Jan 4, 2024 •

edited

Loading

def- left a comment •

edited

Loading

ggevay commented Jan 4, 2024 •

edited

Loading