-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adapter,storage: expose the catalog shard to SQL #24138
Conversation
Yes, we already want to start using timestamps from the timestamp oracle for writes we just haven't gotten around to that yet. An open question around this is should we use the existing
With the |
I have it in my branch/PR and will hopefully land if a couple weeks: #24816
Agreed! I don't like manually advancing the upper continually, so adding it to txn would be the tidiest solution. I thought a tiny bit about the bootstrapping problem, where we have to read from the catalog first in order to set up txn and everything else: I think it can be done, we can read from the catalog shard like from a normal shard, bootstrap txn and things, and then switch over to treating it as a txns shard. That way we get around the chicken-and-egg problem of requiring reading the catalog shard to know the location of the txn shard and potentially other parameters. |
My vote would be to add a new builtin and |
Yeah, that's interesting! I think resolving all this blocks replacing the builtin tables with views that are derived from the catalog, but maybe doesn't need to block getting this PR merged? The only thing that would give me pause would be how the read policy interacts with the way the catalog shard timestamps are picked. I wouldn't want to accidentally be holding back compaction on the catalog shard because the storage controller is applying a millisecond-based read policy to what is actually catalog versions.
Last I heard, the way storage folks wanted to represent this was as a I tried a variant of this PR where we added it as a variant to that enum. Mechanically, there's a bunch of Broadly, I don't think it's unreasonable for somewhere there to be an enum which looks like
I was hoping to land this more like next week! Is there a bit of that we could pull out and merge as part of this? Mechanically, all we need really is a task that does the |
src/catalog/src/builtin.rs
Outdated
data_source: BuiltinSourceType::Catalog, | ||
desc: crate::durable::persist::desc(), | ||
is_retained_metrics_object: false, | ||
access: vec![MONITOR_SELECT], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is not ready for review, but I'm just leaving this comment so that I don't forget later. I think at first we'll probably want to restrict this to just mz_system
and mz_support
.
Splitting this out of MaterializeInc#24138, which exposes the catalog shard via SQL, because there are some open questions to resolve for the adapter pieces, but the storage part is ready to go. Bonus, this neatly splits the needed code reviewers. The catalog is already a persist shard with the usual `SourceData, (), Timestamp, Diff` type, maintained entirely outside storage. Model this as a storage collection which happens to be backed by a persist shard. We could copy this shard into one managed entirely by storage, but as a performance and complexity optimization, we don't.
Splitting this out of MaterializeInc#24138, which exposes the catalog shard via SQL, because there are some open questions to resolve for the adapter pieces, but the storage part is ready to go. Bonus, this neatly splits the needed code reviewers. The catalog is already a persist shard with the usual `SourceData, (), Timestamp, Diff` type, maintained entirely outside storage. Model this as a storage collection which happens to be backed by a persist shard. We could copy this shard into one managed entirely by storage, but as a performance and complexity optimization, we don't.
Splitting this out of MaterializeInc#24138, which exposes the catalog shard via SQL, because there are some open questions to resolve for the adapter pieces, but the storage part is ready to go. Bonus, this neatly splits the needed code reviewers. The catalog is already a persist shard with the usual `SourceData, (), Timestamp, Diff` type, maintained entirely outside storage. Model this as a storage collection which happens to be backed by a persist shard. We could copy this shard into one managed entirely by storage, but as a performance and complexity optimization, we don't.
Splitting this out of MaterializeInc#24138, which exposes the catalog shard via SQL, because there are some open questions to resolve for the adapter pieces, but the storage part is ready to go. Bonus, this neatly splits the needed code reviewers. The catalog is already a persist shard with the usual `SourceData, (), Timestamp, Diff` type, maintained entirely outside storage. Model this as a storage collection which happens to be backed by a persist shard. We could copy this shard into one managed entirely by storage, but as a performance and complexity optimization, we don't.
MitigationsCompleting required mitigations increases Resilience Coverage.
Risk Summary:The pull request has a high-risk score of 78, driven by the number of files modified and function declarations within those files. Historically, PRs with these predictors are 65% more likely to cause a bug than the repository baseline. Additionally, the repository's observed and predicted bug trends are both decreasing. Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity. Bug Hotspots:
|
aa25f00
to
a8622a9
Compare
I'm taking this over from @danhhz to give him cycles to drive on the persist sink refactor. I'm happy to report I think this is ready for a full review! The first commit is split out into its own PR (#29768) in case it's easier to merge that commit in isolation. The second commit contains the actual implementation, but it's a pretty small and straightforward change in the end. @petrosagg @jkosh44 — can you take a look? All of the open questions are resolved, and I've added notes about their resolution to the PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, excited for this to land! Though it would probably be best for a persist person to sign off on dangerously_expire
, unless @danhhz was the actual author of that one.
simple conn=mz_system,user=mz_system | ||
SELECT data->'value'->>'name' AS name | ||
FROM mz_internal.mz_catalog_raw | ||
WHERE data->>'kind' = 'Item' AND data->'key'->'gid'->'value'->'User' IS NOT NULL; | ||
---- | ||
foo | ||
COMPLETE 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work without an AS OF
or without switching to SERIALIZABLE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is this bit to mark it as being not in Timeline::EpochMillis https://github.com/MaterializeInc/materialize/pull/24138/files#diff-aaa12f17e3b6a75cb7184863212b12712a8b1f808df572f69dbd8c39b518d27dR723-R739
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I guess I'm just surprised that the non-Epoch millis timeline machinery actually works out of the box.
@@ -2132,6 +2133,19 @@ pub static MZ_COMPUTE_OPERATOR_HYDRATION_STATUSES_PER_WORKER: LazyLock<BuiltinSo | |||
access: vec![PUBLIC_SELECT], | |||
}); | |||
|
|||
pub static MZ_CATALOG_RAW: LazyLock<BuiltinSource> = LazyLock::new(|| BuiltinSource { | |||
name: "mz_catalog_raw", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_raw
was meant as a bit of a placeholder when I typed it. Happy to go with whatever, it's easy to change later, but just in case we like any of these better, some other options I've considered are _json
, _unified
, _all
, _multiplexed
, _multi
, mz_internal.mz_catalog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought about other names, and decided I liked raw
best. And indeed it's very easy to change given that this is mz_system
only.
src/catalog/src/durable/persist.rs
Outdated
// TODO: We may need to use a different critical reader | ||
// id for this if we want to be able to introspect it via SQL. | ||
PersistClient::CONTROLLER_CRITICAL_SINCE, | ||
PersistClient::CATALOG_CRITICAL_SINCE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think we just want to delete critical sinces from the durable catalog code. AFAICT the catalog only uses this to ensure that things keep compacted, which the storage controller will now handle. In particular, it doesn't seem to use the critical since as an additional read capability. Though definitely we should get a @jkosh44 double check on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm good with that if you prefer! I'm 99% sure that'll work just fine from an implementation perspective. From a philosophical perspective, I wasn't sure whether we liked the idea of the catalog's read policy being determined by the read policy on mz_internal.mz_catalog_raw
, or whether we wanted to keep our own read capability around (i.e., what this PR does).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT the catalog only uses this to ensure that things keep compacted, which the storage controller will now handle. In particular, it doesn't seem to use the critical since as an additional read capability. Though definitely we should get a @jkosh44 double check on this.
This is correct, we're always reading the most recent timestamp. So we only use this to compact the shard and ensure that some timestamp is readable, i.e. to keep since <= upper
unless there's some other mechanism that does this..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, with this PR, the storage controller is opening a critical since handle that has a read policy of lagging the upper by 1. So if you're comfortable relying on the storage controller to manage the since handle, we can just remove the critical since handle in the catalog
crate entirely.
src/catalog/src/durable/persist.rs
Outdated
// need to expire our old critical since handle that uses the controller | ||
// critical ID because it uses the wrong opaque type. | ||
// | ||
// TODO(benesch): remove this in v0.121. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tried typing this, but if the following would work, I almost wonder if it's an easier hack to reason about.
- Somewhere before the storage controller gets this shard (here?) make sure the opaque value is at least 1
- Replace
assert_eq!(reader_state.opaque_codec, O::codec_name());
in State::CaDS with something that allows it to go from"i64"
to"PersistEpoch"
, which we can remove in the following release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @bkirwi Would love any thoughts you have here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I thought about that, and I'm totally on board with that too! I was just worried that y'all would object to adding a hack in mz-persist-client
like that because it applies to every shard, and it's really only the catalog shard that we want to allow this transition for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's for every shard, but it's pretty targeted (nothing else uses i64
for Opaque) and it's easy to reason about things that happen in cmd impls (because linearized modifications to state, yadda yadda). I'd be a little stressed about the expire version because there are all sorts of larger distributed interactions there and if we hose some prod environment's CONTROLLER_CRITICAL_SINCE, then.. we're still gonna be pretty screwed :)
Porting over a comment from @petrosagg on #29768:
The "storage-specific dataflow fragment" is the
I think this is very much on the table! Tagging in @jkosh44, who is presently skunkworksing a change to move the shard into the |
Exactly. In the past having compute changes not lead to a recompilation of The |
Gotcha, makes sense. |
cd272a2
to
15a3da7
Compare
Adapter's catalog is now stored in a persist shard with our usual `<SourceData, (), Timestamp, i64>` types. As a result, we can present this shard for introspection to the rest of the system. A number of internal tables contain information derived entirely from the catalog, which is the source of truth. For these tables, when the catalog changes, we issue a second write to the storage controller Append api. This PR sets us up for starting to replace those table with `VIEW`s. Not only does this save us the second write's complexity and performance hit, it reduces the chance for discrepancies when catalog changes are happening in multiple places (Pv2). It also happens to be extremely cool for debugging. Now that the catalog also contains the storage controller state, this allows us to introspect that, too. ``` materialize=> COPY (SUBSCRIBE TO (select data from mz_internal.mz_catalog_raw where data->>'kind' = 'UnfinalizedShard') as of 1) TO STDOUT; 13 1 {"key":{"shard":"sc6df7783-69cb-4b31-9b45-98c8e2799076"},"kind":"UnfinalizedShard"} materialize=> COPY (SUBSCRIBE TO (select data from mz_internal.mz_catalog_raw where data->>'kind' = 'StorageCollectionMetadata') as of 1) TO STDOUT; 6 1 {"key":{"id":{"value":{"System":450}}},"kind":"StorageCollectionMetadata","value":{"shard":"s04d31384-3a30-48d4-8ca8-6d35464ebd56"}} ... 6 1 {"key":{"id":{"value":{"System":487}}},"kind":"StorageCollectionMetadata","value":{"shard":"s9b99714a-0f13-4653-a6e6-92cf0eab50a8"}} 12 1 {"key":{"id":{"value":{"User":1}}},"kind":"StorageCollectionMetadata","value":{"shard":"sc6df7783-69cb-4b31-9b45-98c8e2799076"}} 13 -1 {"key":{"id":{"value":{"User":1}}},"kind":"StorageCollectionMetadata","value":{"shard":"sc6df7783-69cb-4b31-9b45-98c8e2799076"}} ``` Co-authored-by: Nikhil Benesch <[email protected]>
@jkosh44 @danhhz here's a patch for what it would look like to rely on a single critical since handle in the controller, and do a more surgical migration of the opaque types inside of persist. diff --git a/src/catalog/src/durable/persist.rs b/src/catalog/src/durable/persist.rs
index 1ca5b9436f..82ca4a708a 100644
--- a/src/catalog/src/durable/persist.rs
+++ b/src/catalog/src/durable/persist.rs
@@ -28,24 +28,22 @@ use mz_ore::metrics::MetricsFutureExt;
use mz_ore::now::EpochMillis;
use mz_ore::{
soft_assert_eq_no_log, soft_assert_eq_or_log, soft_assert_ne_or_log, soft_assert_no_log,
- soft_assert_or_log, soft_panic_or_log,
+ soft_assert_or_log,
};
use mz_persist_client::cfg::USE_CRITICAL_SINCE_CATALOG;
use mz_persist_client::cli::admin::{CATALOG_FORCE_COMPACTION_FUEL, CATALOG_FORCE_COMPACTION_WAIT};
-use mz_persist_client::critical::SinceHandle;
use mz_persist_client::error::UpperMismatch;
use mz_persist_client::read::{Listen, ListenEvent, ReadHandle};
use mz_persist_client::write::WriteHandle;
use mz_persist_client::{Diagnostics, PersistClient, ShardId};
use mz_persist_types::codec_impls::UnitSchema;
-use mz_persist_types::Opaque;
use mz_proto::{RustType, TryFromProtoError};
use mz_repr::Diff;
use mz_storage_types::sources::SourceData;
use sha2::Digest;
use timely::progress::{Antichain, Timestamp as TimelyTimestamp};
use timely::Container;
-use tracing::{debug, info, warn};
+use tracing::{debug, warn};
use uuid::Uuid;
use crate::durable::debug::{Collection, DebugCatalogState, Trace};
@@ -358,8 +356,6 @@ pub(crate) trait ApplyUpdate<T: IntoStateUpdateKindJson> {
pub(crate) struct PersistHandle<T: TryIntoStateUpdateKind, U: ApplyUpdate<T>> {
/// The [`Mode`] that this catalog was opened in.
pub(crate) mode: Mode,
- /// Since handle to control compaction.
- since_handle: SinceHandle<SourceData, (), Timestamp, Diff, i64>,
/// Write handle to persist.
write_handle: WriteHandle<SourceData, (), Timestamp, Diff>,
/// Listener to catalog changes.
@@ -502,27 +498,6 @@ impl<T: TryIntoStateUpdateKind, U: ApplyUpdate<T>> PersistHandle<T, U> {
return Err(e.into());
}
- // Lag the shard's upper by 1 to keep it readable.
- let downgrade_to = Antichain::from_elem(next_upper.saturating_sub(1));
-
- // The since handle gives us the ability to fence out other downgraders using an opaque token.
- // (See the method documentation for details.)
- // That's not needed here, so we the since handle's opaque token to avoid any comparison
- // failures.
- let opaque = *self.since_handle.opaque();
- let downgrade = self
- .since_handle
- .maybe_compare_and_downgrade_since(&opaque, (&opaque, &downgrade_to))
- .await;
-
- match downgrade {
- None => {}
- Some(Err(e)) => soft_panic_or_log!("found opaque value {e}, but expected {opaque}"),
- Some(Ok(updated)) => soft_assert_or_log!(
- updated == downgrade_to,
- "updated bound should match expected"
- ),
- }
self.sync(next_upper).await?;
Ok(next_upper)
}
@@ -975,17 +950,6 @@ impl UnopenedPersistCatalogState {
}
}
- let since_handle = persist_client
- .open_critical_since(
- catalog_shard_id,
- PersistClient::CATALOG_CRITICAL_SINCE,
- Diagnostics {
- shard_name: CATALOG_SHARD_NAME.to_string(),
- handle_purpose: "durable catalog state critical since".to_string(),
- },
- )
- .await
- .expect("invalid usage");
let (mut write_handle, mut read_handle) = persist_client
.open(
catalog_shard_id,
@@ -1028,7 +992,6 @@ impl UnopenedPersistCatalogState {
let mut handle = UnopenedPersistCatalogState {
// Unopened catalogs are always writeable until they're opened in an explicit mode.
mode: Mode::Writable,
- since_handle,
write_handle,
listen,
persist_client,
@@ -1158,7 +1121,6 @@ impl UnopenedPersistCatalogState {
);
let mut catalog = PersistCatalogState {
mode: self.mode,
- since_handle: self.since_handle,
write_handle: self.write_handle,
listen: self.listen,
persist_client: self.persist_client,
@@ -1223,43 +1185,6 @@ impl UnopenedPersistCatalogState {
.increment_catalog_upgrade_shard_version(self.update_applier.organization_id)
.await;
- // Before v0.120, the durable catalog opened a since handle using the
- // controller's critical ID. Starting in v0.120, the durable catalog
- // uses its own critical ID so that the storage controller can register
- // a critical since handle using a controller critical ID. However, we
- // need to expire our old critical since handle that uses the controller
- // critical ID because it uses the wrong opaque type.
- //
- // TODO(benesch): remove this in v0.121.
- if matches!(self.mode, Mode::Writable) {
- let old_since_handle: SinceHandle<SourceData, (), Timestamp, Diff, i64> = catalog
- .persist_client
- .open_critical_since(
- self.shard_id,
- PersistClient::CONTROLLER_CRITICAL_SINCE,
- Diagnostics {
- shard_name: CATALOG_SHARD_NAME.to_string(),
- handle_purpose: "durable catalog state old critical since".to_string(),
- },
- )
- .await
- .expect("invalid usage");
- let opaque = *old_since_handle.opaque();
- if opaque == <i64 as Opaque>::initial() {
- // If the opaque value is the initial i64 opaque value,
- // we're looking at a critical since handle that an old
- // version of the catalog created. It's safe to call expire
- // on this handle. We don't need to worry about this
- // accidentally finalizing the shard because
- // `catalog.since_handle` is holding back the read frontier.
- info!("expiring old critical since handle for catalog shard");
- old_since_handle.dangerously_expire().await;
- } else {
- info!(%opaque, "not expiring critical since handle for catalog shard; looks new");
- drop(old_since_handle);
- }
- }
-
let write_handle = catalog
.persist_client
.open_writer::<SourceData, (), Timestamp, i64>(
diff --git a/src/persist-client/src/critical.rs b/src/persist-client/src/critical.rs
index 7c4a2cdb55..70062bbd47 100644
--- a/src/persist-client/src/critical.rs
+++ b/src/persist-client/src/critical.rs
@@ -329,23 +329,6 @@ where
// downgrade it to [].
// TODO(bkirwi): revert this when since behaviour on expiry has settled,
// or all readers are associated with a critical handle.
-
- /// Politely expires this reader, releasing its since capability.
- ///
- /// Added back temporarily to faciliate the migration of the catalog shard's
- /// critical since handler to the controller. This migration is careful to
- /// uphold the invariant that an empty `state.critical_readers` means that
- /// the shard has never had a critical reader registered--i.e., the
- /// migration ensures that the shard always has at least one other critical
- /// reader registered before calling this method.
- ///
- /// TODO(benesch): remove this in v0.121.
- #[doc(hidden)]
- #[instrument(level = "debug", fields(shard = %self.machine.shard_id()))]
- pub async fn dangerously_expire(mut self) {
- let (_, maintenance) = self.machine.expire_critical_reader(&self.reader_id).await;
- maintenance.start_performing(&self.machine, &self.gc);
- }
}
#[cfg(test)]
diff --git a/src/persist-client/src/internal/state.rs b/src/persist-client/src/internal/state.rs
index 09cc036f91..83b3d77c33 100644
--- a/src/persist-client/src/internal/state.rs
+++ b/src/persist-client/src/internal/state.rs
@@ -1391,6 +1391,15 @@ where
}
let reader_state = self.critical_reader(reader_id);
+
+ // One-time migration of catalog shard from i64 opaques to PersistEpoch
+ // opaques.
+ //
+ // TODO(benesch): remove in v0.121.
+ let initial_i64_opaque = OpaqueState(<i64 as Codec64>::encode(&<i64 as Opaque>::initial()));
+ if reader_state.opaque_codec == "i64" && reader_state.opaque == initial_i64_opaque {
+ reader_state.opaque_codec = "PersistEpoch".into();
+ }
assert_eq!(reader_state.opaque_codec, O::codec_name());
if &O::decode(reader_state.opaque.0) != expected_opaque {
diff --git a/src/persist-client/src/lib.rs b/src/persist-client/src/lib.rs
index ccb76b2b94..f512f46140 100644
--- a/src/persist-client/src/lib.rs
+++ b/src/persist-client/src/lib.rs
@@ -410,20 +410,6 @@ impl PersistClient {
Ok(fetcher)
}
-
- /// A convenience [CriticalReaderId] for the catalog shard.
- ///
- /// ```rust
- /// // This prints as something that is not 0 but is visually recognizable.
- /// assert_eq!(
- /// mz_persist_client::PersistClient::CATALOG_CRITICAL_SINCE.to_string(),
- /// "c55555555-6666-7777-8888-999999999999",
- /// )
- /// ```
- pub const CATALOG_CRITICAL_SINCE: CriticalReaderId = CriticalReaderId([
- 85, 85, 85, 85, 102, 102, 119, 119, 136, 136, 153, 153, 153, 153, 153, 153,
- ]);
-
/// A convenience [CriticalReaderId] for Materialize controllers.
///
/// For most (soon to be all?) shards in Materialize, a centralized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that diff seems great to me, assuming CI is happy with it.
src/catalog/src/durable/persist.rs
Outdated
// need to expire our old critical since handle that uses the controller | ||
// critical ID because it uses the wrong opaque type. | ||
// | ||
// TODO(benesch): remove this in v0.121. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's for every shard, but it's pretty targeted (nothing else uses i64
for Opaque) and it's easy to reason about things that happen in cmd impls (because linearized modifications to state, yadda yadda). I'd be a little stressed about the expire version because there are all sorts of larger distributed interactions there and if we hose some prod environment's CONTROLLER_CRITICAL_SINCE, then.. we're still gonna be pretty screwed :)
132244e
to
6db14c8
Compare
Cool. I've optimistically punched it in. |
The critical since handle that was previously owned by the catalog will now be owned by the controller, but the controller uses a different opaque type for the handle. Add a migration to expire the old critical since handle created by the catalog so that it can be reregistered with the correct opaque type by the storage controller. Without this migration, when upgrading from v0.119, the storage controller panics during bootstrapping when calling compare_and_downgrade_since on the catalog shard.
Cleaning up old PRs |
Adapter's catalog is now stored in a persist shard with our usual
<SourceData, (), Timestamp, i64>
types. As a result, we can presentthis shard for introspection to the rest of the system.
A number of internal tables contain information derived entirely from
the catalog, which is the source of truth. For these tables, when the
catalog changes, we issue a second write to the storage controller
Append api.
This PR sets us up for starting to replace those table with
VIEW
s. Notonly does this save us the second write's complexity and performance
hit, it reduces the chance for discrepancies when catalog changes are
happening in multiple places (Pv2).
It also happens to be extremely cool for debugging. Now that the catalog
also contains the storage controller state, this allows us to introspect
that, too.
Motivation
Tips for reviewer
Open questions:
Storage: The last time we talked about this, we decided to model this conceptually as a "source" external to storage that happens to be ingested from a persist shard, with a performance optimization to read the shard directly instead of copying it to a new shard. But mechanically, I forgot what we said: was itDataSource::Other(DataSourceOther::Shard(ShardId))
?DataSourceOther::Compute
for this; just need to give that variant a more generic name. storage-controller: flatten DataSourceOther #29768 does the trick.Storage: The new storage controller durability stuff requires that we write down the shard ids before thecreate_collections
call, which means we only see theShardId
in theDataSource
too late. I've got a big hack and a WIP comment here in the draft. Lots of options for fixing this and I don't have a particular preference, so happy to type up whatever y'all prefer!Adapter: How do we want to model this in builtins.rs and the CatalogEntry? I'd prefer to avoid adding a new variant toDataSourceDesc
since that gets matched in a ton of places and AFAICTDataSourceDesc::Introspection
is semantically correct (enough).BuiltinSource
object as we do for other builtin sources, but we've introduced aDataSourceIntrospectionDesc
that distinguishes between the catalog introspection and other types of storage introspection collections.Joe: We'll probably want to start using more realtime timestamps for the writes instead of just incrementing by 1. I assume that's not particularly hard? The thing I haven't figured out yet is if we need to start advancing the upper of the catalog shard as time passes, or if there's some nice way around that.Aljoscha: This will want a bit of code that watches for upper advancements and sends them to the storage controller to power the since downgrades. IIRC some of your upcoming controller work already does something like that. Is that landing soon and/or is there a bit we should pull out and merge to reuse here?Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.