Skip to content

Commit

Permalink
Add minimal "storage management" to Sled Agent (#344)
Browse files Browse the repository at this point in the history
Adds a "Storage Manager" to the Sled Agent, which monitors a message queue of Zpools.

- Upon arrival of a new zpool, the storage manager labels the Zpool with a UUID, and splits it into filesystems for Crucible + CockroachDB. Presumably we could add a filesystem for Clickhouse or other services requiring local storage.
- The Storage manager additionally starts zones for each of these services, with the intention of "running these services once storage is available".
- Additionally, the storage manager sends notifications to Nexus once the zpool has been initialized.

Note, none of this occurs without an explicitly supplied zpool - currently, this behavior will not be triggered
unless explicitly requested during sled agent initialization.
  • Loading branch information
smklein authored Dec 2, 2021
1 parent 8dc13c9 commit 256dca4
Show file tree
Hide file tree
Showing 20 changed files with 1,574 additions and 446 deletions.
72 changes: 36 additions & 36 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 10 additions & 2 deletions sled-agent/README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ between the two implementations.
== Code Tour

* `src/bin`: Contains binaries for the sled agent (and simulated sled agent).
* `src/bootstrap`: Contains bootstrap-related services, operating on a distinct
HTTP endpoint from typical sled operation.
* `src/common`: Shared state machine code between the simulated and real sled agent.
* `src/sim`: Library code responsible for operating a simulated sled agent.
* `src/illumos`: Illumos-specific helpers for accessing OS utilities to manage a sled.
Expand All @@ -25,6 +27,12 @@ Additionally, there are some noteworthy top-level files used by the sled agent:

* `src/instance_manager.rs`: Manages multiple instances on a sled.
* `src/instance.rs`: Manages a single instance.
* `src/storage_manager.rs`: Manages storage within a sled.

As well as some utilities:

* `src/running_zone.rs`: RAII wrapper around a running Zone owned by the Sled Agent.
* `src/vnic.rs`: RAII wrapper around VNICs owned by the Sled Agent.

== Life of an Instance

Expand All @@ -39,8 +47,8 @@ following steps on initialization to manage OS-local resources.
. ... creates a new "base zone", which contains the necessary pieces to execute
a Propolis server, and as little else as possible. This base zone is derived
from the "sparse" zone template.
. ... identifies all Oxide-controlled zones (with the prefix `propolis_instance_`)
and all Oxide-controlled VNICs (with the prefix `vnic_propolis`), which are
. ... identifies all Oxide-controlled zones (with the prefix `oxz_`)
and all Oxide-controlled VNICs (with the prefix `ox_vnic_`), which are
removed from the machine.

.To allocate an instance on the Sled, the following steps occur:
Expand Down
11 changes: 11 additions & 0 deletions sled-agent/src/bin/sled-agent.rs
Original file line number Diff line number Diff line change
Expand Up @@ -54,20 +54,29 @@ enum Args {
},
/// Runs the Sled Agent server.
Run {
/// UUID of the Sled Agent.
#[structopt(name = "SA_UUID", parse(try_from_str))]
uuid: Uuid,

/// Socket address of the bootstrap agent.
#[structopt(name = "BA_IP:PORT", parse(try_from_str))]
bootstrap_agent_addr: SocketAddr,

/// Socket address of the sled agent.
#[structopt(name = "SA_IP:PORT", parse(try_from_str))]
sled_agent_addr: SocketAddr,

/// Socket address of Nexus.
#[structopt(name = "NEXUS_IP:PORT", parse(try_from_str))]
nexus_addr: SocketAddr,

/// Optional VLAN, tagged on all guest NICs.
#[structopt(long = "vlan")]
vlan: Option<VlanID>,

/// Optional list of zpools managed by Sled agent.
#[structopt(long = "zpools", name = "zpools", parse(try_from_str))]
zpools: Option<Vec<String>>,
},
}

Expand Down Expand Up @@ -98,6 +107,7 @@ async fn do_run() -> Result<(), CmdError> {
sled_agent_addr,
nexus_addr,
vlan,
zpools,
} => {
// Configure and run the Bootstrap server.
let config = BootstrapConfig {
Expand Down Expand Up @@ -127,6 +137,7 @@ async fn do_run() -> Result<(), CmdError> {
level: ConfigLoggingLevel::Info,
},
vlan,
zpools,
};

let sled_server = sled_server::Server::start(&config)
Expand Down
2 changes: 2 additions & 0 deletions sled-agent/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,6 @@ pub struct Config {
pub log: ConfigLogging,
/// Optional VLAN ID to be used for tagging guest VNICs.
pub vlan: Option<VlanID>,
/// Optional list of zpools to be used as "discovered disks".
pub zpools: Option<Vec<String>>,
}
7 changes: 5 additions & 2 deletions sled-agent/src/http_entrypoints.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ use dropshot::HttpResponseOk;
use dropshot::Path;
use dropshot::RequestContext;
use dropshot::TypedBody;
use omicron_common::api::external::Error;
use omicron_common::api::internal::nexus::DiskRuntimeState;
use omicron_common::api::internal::nexus::InstanceRuntimeState;
use omicron_common::api::internal::sled_agent::InstanceEnsureBody;
Expand Down Expand Up @@ -59,7 +60,8 @@ async fn instance_put(
let body_args = body.into_inner();
Ok(HttpResponseOk(
sa.instance_ensure(instance_id, body_args.initial, body_args.target)
.await?,
.await
.map_err(|e| Error::from(e))?,
))
}

Expand Down Expand Up @@ -87,6 +89,7 @@ async fn disk_put(
body_args.initial_runtime.clone(),
body_args.target.clone(),
)
.await?,
.await
.map_err(|e| Error::from(e))?,
))
}
Loading

0 comments on commit 256dca4

Please sign in to comment.