-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sled-agent] Avoid causing UUID conflicts #7266
Conversation
@@ -526,7 +524,7 @@ impl StorageHandle { | |||
// and ask for the set of all datasets from Nexus. | |||
pub async fn upsert_filesystem( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this is the only call to upsert_filesystem
. Once we remove the invocation in omicron_zones_ensure
, we can remove this function (and this "optional dataset UUID" argument) altogether.
sled-agent/src/sled_agent.rs
Outdated
// | ||
// This means that: | ||
// - If the dataset exists and has a UUID, this will be a no-op | ||
// - If the dataset doesn't exist, it'll be created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add something like
// - If the dataset doesn't exist, it'll be created | |
// - If the dataset doesn't exist, it'll be created without its oxide:uuid zfs property set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
additional_options: None, | ||
})?; | ||
// Ensure the dataset has a usable UUID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a little weirded out by this block of code. Was it strictly unnecessary before? If there's any chance it wasn't, should we keep it but inside an if let Some(dataset_id) { ... }
block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a historical artifact -- I'd prefer to remove it if we can, because I think sled agent is plagued by too many of these "two ways to set a value" paths already, and I'd prefer to reduce it to one.
From 256dca4 (2021) -
omicron/sled-agent/src/storage_manager.rs
Lines 232 to 246 in 256dca4
// Idempotently ensure the named dataset exists as a filesystem with a UUID. | |
// | |
// Returns the UUID attached to the ZFS filesystem. | |
fn ensure_dataset_with_id(fs_name: &str) -> Result<Uuid, Error> { | |
Zfs::ensure_filesystem(&fs_name, Mountpoint::Legacy)?; | |
// Ensure the dataset has a usable UUID. | |
if let Ok(id_str) = Zfs::get_oxide_value(&fs_name, "uuid") { | |
if let Ok(id) = id_str.parse::<Uuid>() { | |
return Ok(id); | |
} | |
} | |
let id = Uuid::new_v4(); | |
Zfs::set_oxide_value(&fs_name, "uuid", &id.to_string())?; | |
Ok(id) | |
} |
From ccc28fe (2023) - this was renamed, but still had the structure of "sled agent calls Zfs::ensure_filesystem
, and then sets the UUID:
omicron/sled-agent/src/storage_manager.rs
Lines 269 to 298 in ccc28fe
fn ensure_dataset( | |
&mut self, | |
dataset_id: Uuid, | |
dataset_name: &DatasetName, | |
) -> Result<(), Error> { | |
let zoned = true; | |
let fs_name = &dataset_name.full(); | |
let do_format = true; | |
Zfs::ensure_filesystem( | |
&dataset_name.full(), | |
Mountpoint::Path(PathBuf::from("/data")), | |
zoned, | |
do_format, | |
)?; | |
// Ensure the dataset has a usable UUID. | |
if let Ok(id_str) = Zfs::get_oxide_value(&fs_name, "uuid") { | |
if let Ok(id) = id_str.parse::<Uuid>() { | |
if id != dataset_id { | |
return Err(Error::UuidMismatch { | |
name: dataset_name.clone(), | |
old: id, | |
new: dataset_id, | |
}); | |
} | |
return Ok(()); | |
} | |
} | |
Zfs::set_oxide_value(&fs_name, "uuid", &dataset_id.to_string())?; | |
Ok(()) | |
} |
From de8bc93 (last week) - Zfs::ensure_filesystem
(later renamed to ensure_dataset
) started being able to explicitly set properties, as part of the optimization to avoid calling zfs set
individually for each property.
omicron/illumos-utils/src/zfs.rs
Lines 550 to 557 in de8bc93
let props = build_zfs_set_key_value_pairs(size_details, id); | |
if exists { | |
Self::set_values(name, props.as_slice()).map_err(|err| { | |
EnsureFilesystemError { | |
name: name.to_string(), | |
err: err.err.into(), | |
} | |
})?; |
This was added recently, because (using dtrace) I noticed the callstack was basically:
zfs set <size properties , one at a time>
zfs get uuid
zfs set uuid
And by moving all properties into Zfs::ensure_filesystem
, this could be a single call to get/set
.
All this is to say:
- After de8bc93, any property-managing code after calling
Zfs::ensure_filesystem
should not be necessary. If the UUID was supplied, it should have already been checked/set at this point. - I'd prefer to remove this block of code, rather than diverge the code which mutates dataset UUIDs further.
Does that sound reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah! Yes, I was doing the same digging and came to the same conclusion (i.e., #7236 made this code unnecessary). 👍 on the removal.
let dataset_id = Uuid::new_v4(); | ||
// Create a filesystem on the newly formatted U.2. | ||
// | ||
// We can call "upsert_filesystem" both with and without a UUID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add a test for calling with None
and seeing the dataset come back without an ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added a new test: upsert_filesystem_no_uuid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blueprint execution now works on a newly-initialized rack (rel/v12 + this PR backport).
omicron_zones_ensure
. Instead, don't set any dataset UUID, which lets subsequent calls todatasets_ensure
set the right value here.Fixes #7265