Skip to content

Commit

Permalink
Merge branch 'main' into release-v0.1.12
Browse files Browse the repository at this point in the history
Signed-off-by: Dean Roehrich <[email protected]>
  • Loading branch information
roehrich-hpe committed Dec 9, 2024
2 parents d6d3359 + 56b612a commit b7c8ca7
Show file tree
Hide file tree
Showing 2 changed files with 66 additions and 48 deletions.
112 changes: 65 additions & 47 deletions docs/guides/storage-profiles/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,15 @@ DW directives that allocate storage on Rabbit nodes allow a `profile` parameter

The administrator shall choose one profile to be the default profile that is used when a profile parameter is not specified.

# Specifying a Profile
## Specifying a Profile

To specify a profile name on a #DW directive, use the `profile` option
```

```shell
#DW jobdw type=lustre profile=durable capacity=5GB name=example
```

# Setting A Default Profile
## Setting A Default Profile

A default profile must be defined at all times. Any #DW line that does not specify a profile will use the default profile. If a default profile is not defined, then any new workflows will be rejected. If more than one profile is marked as default then any new workflows will be rejected.

Expand All @@ -36,16 +38,18 @@ nnf-system performance false 6s
```

To set the default flag on a profile

```shell
$ kubectl patch nnfstorageprofile performance -n nnf-system --type merge -p '{"data":{"default":true}}'
kubectl patch nnfstorageprofile performance -n nnf-system --type merge -p '{"data":{"default":true}}'
```

To clear the default flag on a profile

```shell
$ kubectl patch nnfstorageprofile durable -n nnf-system --type merge -p '{"data":{"default":false}}'
kubectl patch nnfstorageprofile durable -n nnf-system --type merge -p '{"data":{"default":false}}'
```

# Creating The Initial Default Profile
## Creating The Initial Default Profile

Create the initial default profile from scratch or by using the [NnfStorageProfile/template](https://github.com/NearNodeFlash/nnf-sos/blob/master/config/examples/nnf_v1alpha1_nnfstorageprofile.yaml) resource as a template. If `nnf-deploy` was used to install nnf-sos then the default profile described below will have been created automatically.

Expand Down Expand Up @@ -87,13 +91,13 @@ nnf-system template false 11s
The administrator should edit the `default` profile to record any cluster-specific settings.
Maintain a copy of this resource YAML in a safe place so it isn't lost across upgrades.

## Keeping The Default Profile Updated
### Keeping The Default Profile Updated

An upgrade of nnf-sos may include updates to the `template` profile. It may be necessary to manually copy these updates into the `default` profile.

# Profile Parameters
## Profile Parameters

## XFS
### XFS

The following shows how to specify command line options for pvcreate, vgcreate, lvcreate, and mkfs for XFS storage. Optional mount options are specified one per line

Expand All @@ -118,8 +122,7 @@ data:
[...]
```


## GFS2
### GFS2

The following shows how to specify command line options for pvcreate, lvcreate, and mkfs for GFS2.

Expand All @@ -140,7 +143,7 @@ data:
[...]
```

## Lustre / ZFS
### Lustre / ZFS

The following shows how to specify a zpool virtual device (vdev). In this case the default vdev is a stripe. See [zpoolconcepts(7)](https://openzfs.github.io/openzfs-docs/man/7/zpoolconcepts.7.html) for virtual device descriptions.

Expand Down Expand Up @@ -168,7 +171,7 @@ data:
[...]
```

### ZFS dataset properties
#### ZFS dataset properties

The following shows how to specify ZFS dataset properties in the `--mkfsoptions` arg for mkfs.lustre. See [zfsprops(7)](https://openzfs.github.io/openzfs-docs/man/7/zfsprops.7.html).

Expand All @@ -188,9 +191,10 @@ data:
[...]
```

### Mount Options for Targets
#### Mount Options for Targets

##### Persistent Mount Options

#### Persistent Mount Options
Use the mkfs.lustre `--mountfsoptions` parameter to set persistent mount options for Lustre targets.

```yaml
Expand All @@ -209,7 +213,8 @@ data:
[...]
```

#### Non-Persistent Mount Options
##### Non-Persistent Mount Options

Non-persistent mount options can be specified with the ostOptions.mountTarget parameter to the NnfStorageProfile:

```yaml
Expand All @@ -232,7 +237,7 @@ data:
[...]
```

### Target Layout
#### Target Layout

Users may want Lustre file systems with different performance characteristics. For example, a user job with a single compute node accessing the Lustre file system would see acceptable performance from a single OSS. An FPP workload might want as many OSSs as posible to avoid contention.

Expand Down Expand Up @@ -269,7 +274,7 @@ data:
count: 10
```

#### Example Layouts
##### Example Layouts

`scale` with `colocateComputes=true` will likely be the most common layout type to use for `jobdw` directives. This will result in a Lustre file system whose performance scales with the number of compute nodes in the job.

Expand All @@ -281,47 +286,60 @@ The `count` field may be useful when creating a persistent file system since the

In general, `scale` gives a simple way for users to get a filesystem that has performance consistent with their job size. `count` is useful for times when a user wants full control of the file system layout.

# Command Line Variables
## Command Line Variables

### pvcreate

- `$DEVICE` - expands to the `/dev/<path>` value for one device that has been allocated

### vgcreate

- `$VG_NAME` - expands to a volume group name that is controlled by Rabbit software.
- `$DEVICE_LIST` - expands to a list of space-separated `/dev/<path>` devices. This list will contain the devices that were iterated over for the pvcreate step.

## pvcreate
### lvcreate

* `$DEVICE` - expands to the `/dev/<path>` value for one device that has been allocated
- `$VG_NAME` - see vgcreate above.
- `$LV_NAME` - expands to a logical volume name that is controlled by Rabbit software.
- `$DEVICE_NUM` - expands to a number indicating the number of devices allocated for the volume group.
- `$DEVICE1, $DEVICE2, ..., $DEVICEn` - each expands to one of the devices from the `$DEVICE_LIST` above.

## vgcreate
### XFS mkfs

* `$VG_NAME` - expands to a volume group name that is controlled by Rabbit software.
* `$DEVICE_LIST` - expands to a list of space-separated `/dev/<path>` devices. This list will contain the devices that were iterated over for the pvcreate step.
- `$DEVICE` - expands to the `/dev/<path>` value for the logical volume that was created by the lvcreate step above.

## lvcreate
### GFS2 mkfs

* `$VG_NAME` - see vgcreate above.
* `$LV_NAME` - expands to a logical volume name that is controlled by Rabbit software.
* `$DEVICE_NUM` - expands to a number indicating the number of devices allocated for the volume group.
* `$DEVICE1, $DEVICE2, ..., $DEVICEn` - each expands to one of the devices from the `$DEVICE_LIST` above.
- `$DEVICE` - expands to the `/dev/<path>` value for the logical volume that was created by the lvcreate step above.
- `$CLUSTER_NAME` - expands to a cluster name that is controlled by Rabbit Software
- `$LOCK_SPACE` - expands to a lock space key that is controlled by Rabbit Software.
- `$PROTOCOL` - expands to a locking protocol that is controlled by Rabbit Software.

## XFS mkfs
### zpool create

* `$DEVICE` - expands to the `/dev/<path>` value for the logical volume that was created by the lvcreate step above.
- `$DEVICE_LIST` - expands to a list of space-separated `/dev/<path>` devices. This list will contain the devices that were allocated for this storage request.
- `$POOL_NAME` - expands to a pool name that is controlled by Rabbit software.
- `$DEVICE_NUM` - expands to a number indicating the number of devices allocated for this storage request.
- `$DEVICE1, $DEVICE2, ..., $DEVICEn` - each expands to one of the devices from the `$DEVICE_LIST` above.

## GFS2 mkfs
### lustre mkfs

* `$DEVICE` - expands to the `/dev/<path>` value for the logical volume that was created by the lvcreate step above.
* `$CLUSTER_NAME` - expands to a cluster name that is controlled by Rabbit Software
* `$LOCK_SPACE` - expands to a lock space key that is controlled by Rabbit Software.
* `$PROTOCOL` - expands to a locking protocol that is controlled by Rabbit Software.
- `$FS_NAME` - expands to the filesystem name that was passed to Rabbit software from the workflow's #DW line.
- `$MGS_NID` - expands to the NID of the MGS. If the MGS was orchestrated by nnf-sos then an appropriate internal value will be used.
- `$POOL_NAME` - see zpool create above.
- `$VOL_NAME` - expands to the volume name that will be created. This value will be `<pool_name>/<dataset>`, and is controlled by Rabbit software.
- `$INDEX` - expands to the index value of the target and is controlled by Rabbit software.

## zpool create
### PostMount/PreUnmount and PostActivate/PreDeactivate

* `$DEVICE_LIST` - expands to a list of space-separated `/dev/<path>` devices. This list will contain the devices that were allocated for this storage request.
* `$POOL_NAME` - expands to a pool name that is controlled by Rabbit software.
* `$DEVICE_NUM` - expands to a number indicating the number of devices allocated for this storage request.
* `$DEVICE1, $DEVICE2, ..., $DEVICEn` - each expands to one of the devices from the `$DEVICE_LIST` above.
- `$MOUNT_PATH` - expands to the mount path of the fileystem to perform certain actions on the mounted filesystem

## lustre mkfs
#### Lustre Specific

* `$FS_NAME` - expands to the filesystem name that was passed to Rabbit software from the workflow's #DW line.
* `$MGS_NID` - expands to the NID of the MGS. If the MGS was orchestrated by nnf-sos then an appropriate internal value will be used.
* `$POOL_NAME` - see zpool create above.
* `$VOL_NAME` - expands to the volume name that will be created. This value will be `<pool_name>/<dataset>`, and is controlled by Rabbit software.
* `$INDEX` - expands to the index value of the target and is controlled by Rabbit software.
These variables are for lustre only and can be used to perform PostMount activities such are setting lustre striping.

- `$NUM_MDTS` - expands to the number of MDTs for the lustre filesystem
- `$NUM_MGTS` - expands to the number of MGTs for the lustre filesystem
- `$NUM_MGTMDTS` - expands to the number of combined MGTMDTs for the lustre filesystem
- `$NUM_OSTS` - expands to the number of OSTs for the lustre filesystem
- `$NUM_NNFNODES` - expands to the number of NNF Nodes for the lustre filesystem
2 changes: 1 addition & 1 deletion external/nnf-dm
Submodule nnf-dm updated 88 files
+2 −1 .github/workflows/main.yml
+12 −0 .vscode/launch.json
+46 −5 Dockerfile
+45 −9 Makefile
+8 −8 cmd/main.go
+62 −0 config/copy-offload/copy_offload_role.yaml
+12 −0 config/copy-offload/copy_offload_role_binding.yaml
+18 −0 config/copy-offload/copy_offload_service_account.yaml
+14 −0 config/copy-offload/kustomization.yaml
+4 −0 config/copy-offload/kustomizeconfig.yaml
+6 −0 config/dp0/kustomization.yaml
+13 −0 config/dp0/manager_environment_deploy_patch.yaml
+6 −0 config/kind/kustomization.yaml
+13 −0 config/kind/manager_environment_deploy_patch.yaml
+2 −2 config/manager/kustomization.yaml
+1 −1 config/manager/manager.yaml
+1 −1 config/manager/manager_imagepullsecret_patch.yaml
+0 −2 config/rbac/prometheus/kustomization.yaml
+0 −20 config/rbac/prometheus/monitor.yaml
+3 −0 config/top/kustomization.yaml
+1 −1 crd-bumper.yaml
+44 −44 daemons/compute/server/servers/server_default.go
+126 −0 daemons/copy-offload/cmd/main.go
+82 −0 daemons/copy-offload/pkg/driver/dmrequest.go
+762 −0 daemons/copy-offload/pkg/driver/driver.go
+136 −0 daemons/copy-offload/pkg/server/server.go
+367 −0 daemons/copy-offload/pkg/server/server_test.go
+4 −4 go.mod
+8 −8 go.sum
+44 −764 internal/controller/datamovement_controller.go
+105 −101 internal/controller/datamovement_controller_test.go
+27 −16 internal/controller/datamovementmanager_controller.go
+7 −7 internal/controller/datamovementmanager_controller_test.go
+765 −0 internal/controller/helpers/datamovement_helpers.go
+2 −2 internal/controller/suite_test.go
+1 −0 vendor/github.com/DataWorkflowServices/dws/api/v1alpha2/owner_labels.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/conversion.go
+3 −3 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/groupversion_info.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnf_resource_condition_types.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnf_resource_health_type.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnf_resource_state_type.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnf_resource_status_type.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnf_resource_type.go
+4 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfaccess_types.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfaccess_webhook.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfcontainerprofile_types.go
+2 −2 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfcontainerprofile_webhook.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfdatamovement_types.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfdatamovement_webhook.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfdatamovementmanager_types.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfdatamovementmanager_webhook.go
+14 −2 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfdatamovementprofile_types.go
+2 −2 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfdatamovementprofile_webhook.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnflustremgt_types.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnflustremgt_webhook.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfnode_types.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfnode_webhook.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfnodeblockstorage_types.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfnodeblockstorage_webhook.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfnodeecdata_types.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfnodeecdata_webhook.go
+6 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfnodestorage_types.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfnodestorage_webhook.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfportmanager_types.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfportmanager_webhook.go
+28 −2 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfstorage_types.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfstorage_webhook.go
+17 −7 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfstorageprofile_types.go
+2 −2 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfstorageprofile_webhook.go
+9 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfsystemstorage_types.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/nnfsystemstorage_webhook.go
+1 −1 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/workflow_helpers.go
+60 −8 vendor/github.com/NearNodeFlash/nnf-sos/api/v1alpha4/zz_generated.deepcopy.go
+8 −4 vendor/github.com/NearNodeFlash/nnf-sos/config/crd/bases/nnf.cray.hpe.com_nnfaccesses.yaml
+4 −4 vendor/github.com/NearNodeFlash/nnf-sos/config/crd/bases/nnf.cray.hpe.com_nnfcontainerprofiles.yaml
+4 −4 vendor/github.com/NearNodeFlash/nnf-sos/config/crd/bases/nnf.cray.hpe.com_nnfdatamovementmanagers.yaml
+20 −5 vendor/github.com/NearNodeFlash/nnf-sos/config/crd/bases/nnf.cray.hpe.com_nnfdatamovementprofiles.yaml
+4 −4 vendor/github.com/NearNodeFlash/nnf-sos/config/crd/bases/nnf.cray.hpe.com_nnfdatamovements.yaml
+4 −4 vendor/github.com/NearNodeFlash/nnf-sos/config/crd/bases/nnf.cray.hpe.com_nnflustremgts.yaml
+4 −4 vendor/github.com/NearNodeFlash/nnf-sos/config/crd/bases/nnf.cray.hpe.com_nnfnodeblockstorages.yaml
+4 −4 vendor/github.com/NearNodeFlash/nnf-sos/config/crd/bases/nnf.cray.hpe.com_nnfnodeecdata.yaml
+4 −4 vendor/github.com/NearNodeFlash/nnf-sos/config/crd/bases/nnf.cray.hpe.com_nnfnodes.yaml
+39 −4 vendor/github.com/NearNodeFlash/nnf-sos/config/crd/bases/nnf.cray.hpe.com_nnfnodestorages.yaml
+4 −4 vendor/github.com/NearNodeFlash/nnf-sos/config/crd/bases/nnf.cray.hpe.com_nnfportmanagers.yaml
+184 −27 vendor/github.com/NearNodeFlash/nnf-sos/config/crd/bases/nnf.cray.hpe.com_nnfstorageprofiles.yaml
+38 −4 vendor/github.com/NearNodeFlash/nnf-sos/config/crd/bases/nnf.cray.hpe.com_nnfstorages.yaml
+21 −4 vendor/github.com/NearNodeFlash/nnf-sos/config/crd/bases/nnf.cray.hpe.com_nnfsystemstorages.yaml
+5 −5 vendor/modules.txt

0 comments on commit b7c8ca7

Please sign in to comment.