-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REFRESH
options for MVs -- Compute
#23819
Conversation
32289ce
to
f421c70
Compare
MitigationsCompleting required mitigations increases Resilience Coverage.
Risk Summary:The pull request poses a high risk with a score of 83, driven by predictors such as the average line count and executable lines within files, which historically are associated with a 138% higher likelihood of introducing bugs compared to the repository baseline. The repository has seen a decreasing trend in bugs, but with recent spikes in bug fixes, and a similar pattern is predicted with spikes in riskier pull requests. Additionally, the changes involve 3 files that are known hotspots for bugs. Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity. Bug Hotspots:
|
f421c70
to
7baed08
Compare
Exciting that you've gotten this working end to end, Gabor! This PR is at a size that's really hard to handle in GitHub reviews, though. Would recommend trying to split out the parsing, planning, and actual implementation into separate PRs. The parser stuff looks basically ready to go, for example, and could merge basically ASAP (just modulo the comments I dropped above). Totally okay for a parser-only PR to immediately error in the planner with " |
Ok, I'll split this into 3 PRs. Thanks for the comments! |
@ggevay is this ready for testing at this time? |
@philip-stoev, not yet. Unfortunately, there are several issues with it at the moment, which I'm fixing now, but I got a bit sick, so it's going slowly. |
Status: There are several minor issues that should be easy to fix, but there is one serious panic in slts. I figured out what's causing the panic with Matt's help, but it's not entirely clear how to fix it. Working on it. |
1c5a0e5
to
dee4de6
Compare
912ebd7
to
991fdbb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found 2 separate panics with my one-off draft that hackily converts all MVs to REFRESH EVERY '2 seconds'
:
thread 'timely:work-0' panicked at /cargo/git/checkouts/timely-dataflow-70b80d81d6cabd62/de20aa8/timely/src/dataflow/operators/capability.rs:139:13: Attempted to downgrade Capability { time: 1704332460774, internal: "..." } to 774, which is not beyond the capability's time.
and
thread 'coordinator' panicked at src/sql/src/plan/statement/ddl.rs:2148:29: ALIGNED TO should have been filled in by purification
I'm guessing those are both unexpected. The non-matching rows mostly looked like they are not wrong results, but I'm not sure about the controller-frontiers:
controller-frontiers.td:305:1: error: non-matching rows: expected:
[["0", "<null>"]]
got:
[["751", "<null>"]]
Poor diff:
- 0 <null>
+ 751 <null>
Test results are in https://buildkite.com/materialize/tests/builds/72256 and https://buildkite.com/materialize/nightlies/builds/5842
@def- thank you very much for this test! I've fixed both panics:
I pushed the fixes to your PR, and running CI again. Hopefully, there will be no panics now. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it expected that the REFRESH
option can be repeated? Only the last will have an effect, but the first one is still evaluated, so this panics:
CREATE SOURCE counter
FROM LOAD GENERATOR COUNTER
(TICK INTERVAL '1s');
CREATE MATERIALIZED VIEW mv8 WITH (REFRESH AT mz_unsafe.mz_panic('heh'), REFRESH EVERY '1 second') AS SELECT * FROM counter;
Here I'm only seeing the refresh every 10 seconds, not at 2 seconds after the current time:
CREATE MATERIALIZED VIEW mv7 WITH (REFRESH AT mz_now()::text::int8 + 2000, REFRESH EVERY '10 seconds') AS SELECT * FROM counter; COPY (SUBSCRIBE (SELECT * FROM mv8)) TO STDOUT;
I'm not sure if this is how we handle options in general or if we want to block this off since someone could accidentally use it and expect both refreshes to have an effect.
@def- Definitely both refresh options should take effect! I'll investigate. |
The first one is expected to panic, because both REFRESH options are evaluated, so the The second one creates Edit: I pushed a fix: 6c1f168 |
src/adapter/src/coord/sequencer/inner/create_materialized_view.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have some questions about what setting until
actually guarantees about the times that arrive at a dataflow output, and I'd like to see a test for the bootstrap as-of selection. But given that this will be behind a feature flag and disabled by default, I'm fine with merging this version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, not seeing any panics anymore.
The non-matching rows might be ok, but I'll definitely investigate them.
I took a quick look based on the lastest runs in https://buildkite.com/materialize/tests/builds/72270 and https://buildkite.com/materialize/nightlies/builds/5848
These look worrying:
controller-frontiers.td:305:1: error: non-matching rows: expected:
[["0", "<null>"]]
got:
[["603", "<null>"]]
Poor diff:
- 0 <null>
+ 603 <null>
kafka-avro-upsert-sinks.td:186:1: error: missing record 0: Record {
headers: [
"1",
],
key: Some(
{
(
"a",
Long(1),
),
},
),
value: Some(
{
(
"a",
Long(1),
),
(
"b",
Long(1),
),
},
),
}
temporal.td:18:1: error: executing query failed: db error: ERROR: Timestamp (0) is not valid for all inputs: [Antichain { elements: [700] }]: ERROR: Timestamp (0) is not valid for all inputs: [Antichain { elements: [700] }]
|
17 |
18 | > SELECT * FROM one_bound1 AS OF 0
| ^
> SELECT * FROM foo;
rows didn't match; sleeping to see if dataflow catches up 50ms 75ms 113ms 169ms 253ms 380ms 570ms 854ms 1s 2s 3s 4s 6s 10s 15s 22s 33s 22s
^^^ +++
timestamps-debezium-kafka.td:160:1: error: non-matching rows: expected:
[["1", "1"], ["2", "2"]]
got:
[]
Poor diff:
- 1 1
- 2 2
> SELECT
frontiers.read_frontier,
frontiers.write_frontier
FROM mz_internal.mz_frontiers frontiers
JOIN mz_materialized_views mvs
ON frontiers.object_id = mvs.id
WHERE
mvs.name = 'mv2'
rows didn't match; sleeping to see if dataflow catches up 50ms 75ms 113ms 169ms 253ms 380ms 570ms 854ms 1s 2s 3s 4s 6s 10s 15s 22s 33s 22s
^^^ +++
controller-frontiers.td:305:1: error: non-matching rows: expected:
[["0", "<null>"]]
got:
[["6", "<null>"]]
Poor diff:
- 0 <null>
+ 6 <null>
@def- The non-matching rows where |
@def- The
is also expected: REFRESH options modify what times are valid to read from: originally 0 was valid because this MV has only constant inputs, but with the REFRESH EVERY the first valid read time will be the time of the first refresh, which will be |
@def- |
@def- |
@def- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine as-is. There are several places that could panic, but all seem they could only be triggered when something upstreams fails, which isn't ideal, but something we could revisit later.
I did not review adapter and testing changes.
Merged! Thank you very much for all the discussions and reviews! |
This is the Compute part of the first part of the REFRESH options for materialized views epic. It implements all the refresh options mentioned in the design doc, but it doesn't implement automatic replica management yet.
@jkosh44, could you please review the changes in
interval.rs
?@teskje, could you please review
sequence_create_materialized_view
+ the Compute parts +refresh_schedule.rs
?Note that in
sequence_create_materialized_view
I moved the timestamp selection outside of thecatalog_transact_with_side_effects
, as discussed with Matt here.Motivation
Temporary Limitations
No automated replica management yet, as noted above.
If there is an index on the materialized view, then we have a problem after envd restarts, because the index's since will be bootstrapped to be 1 sec before the next refresh, which is most likely in the future, making queries on the MV block until the next refresh. We should enhance the frontier selection in the bootstrapping at some point, but for now I'll just tell the customer to please don't create indexes on these MVs.
This first version simply relies on Adapter read holds to keep the sinces close to the present moment. Eventually, we'll want to have custom compaction windows on
REFRESH EVERY
MVs (and their indexes), because relying on Adapter read holds precludes SERIALIZABLE reads fromREFRESH EVERY
MVs selecting somewhat older timestamps, making SERIALIZABLE reads not faster than STRICT SERIALIZABLE reads.Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.