diff --git a/config.yaml b/config.yaml index 94718ed..f4d8799 100644 --- a/config.yaml +++ b/config.yaml @@ -65,8 +65,9 @@ contact: 'cms-dpoa-coordinator@cern.ch' # FIXME episodes: - 01-intro.md - 02-storage.md -- 03-disk-image-manual.md +- 03-disk-image.md - 04-cluster.md +- 05-workflow.md # Information for Learners learners: diff --git a/episodes/03-disk-image-manual.md b/episodes/03-disk-image-manual.md index 2afddec..eeb0e9c 100644 --- a/episodes/03-disk-image-manual.md +++ b/episodes/03-disk-image-manual.md @@ -209,7 +209,14 @@ gcloud compute instances delete vm-work --zone=europe-west4-c gcloud compute disks delete pfnano-disk --zone=europe-west4-c ``` +## Nota bene +The workflow does not find the container images in this secondary disk on the node. +A visible difference between the images that the "family" parameter is `secondary-disk-image` for the one built by the disk. + +TBC if setting it so helps. + +But it might be more involved... ## Costs diff --git a/episodes/03-disk-image.md b/episodes/03-disk-image.md index f0b5ee7..32cb876 100644 --- a/episodes/03-disk-image.md +++ b/episodes/03-disk-image.md @@ -146,7 +146,9 @@ Message: Quota 'N2_CPUS' exceeded. are due to requested machine type no being available in the requested zone. Nothing to do with you quota. -Try in a different region or with a different machine type. You can give them as parameters e.g. `--zone=europe-west4-a --machine-type=e2-standard-4` +Try in a different region or with a different machine type. You can give them as parameters e.g. `--zone=europe-west4-a --machine-type=e2-standard-4`. +Independent of the zone specified in parameters, the disk image will have `eu` as the location, so any zone in `europe` is OK (if you plan to create your cluster in a zone in `europe`). + Note that the bucket for logs has to be in the same region so you might need to create another one. Remove the old one with `gcloud storage rm -r gs://`. diff --git a/episodes/04-cluster.md b/episodes/04-cluster.md index 447d572..00aec6f 100644 --- a/episodes/04-cluster.md +++ b/episodes/04-cluster.md @@ -37,12 +37,72 @@ The output shows your account and project. Before you can create resources on GCP, you will need to enable them -If this is your first project or you created it from the Google Cloud Console Web UI, it will have several services enabled. +In addition to what was enabled in the previous section, we will now enable Kubernetes Engine API (container.googleapis.com): + +```bash +gcloud services enable container.googleapis.com +``` + +### Bucket + +If you worked through Section 02, you have now a storage bucket for the output files. + +List the buckets with + +```bash +gcloud storage ls +``` + +### Secondary disk + +If you worked through Section 03, you have a secondary boot disk image available ## Get the code +The example Terraform scripts and Argo Workflow configuration are in + +Get them with + +```bash +git clone git@github.com:cms-dpoa/cloud-processing.git +cd cloud-processing/standard-gke-cluster-gcs-imgdisk +``` + ## Create the cluster +Set the variable in the `terraform.tfvars` files. + +Run + +```bash +terraform apply +``` + +and confirm "yes". + +## Connect to the cluster and inspect + +```bash +gcloud container clusters get-credentials cluster-2 --region europe-we +st4-a --project hip-new-full-account +``` + +```bash +kubectl get nodes +``` + +```bash +kubectl get ns +``` + +## Enable image streaming + +```bash + gcloud container clusters update cluster-2 --zone europe-west4-a --ena +ble-image-streaming + +``` + ## Costs @@ -53,9 +113,8 @@ If this is your first project or you created it from the Google Cloud Console We ::::::::::::::::::::::::::::::::::::: keypoints -- Google Cloud Storage bucket can be used to store the output files. -- The storage cost depends on the volume stored and for this type of processing is very small. -- The download of the output files for the bucket has a signicant cost. +- Kubernetes clusters can be created with Terraform scripts. +- kubectl is the tool to interact with the cluster. :::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/episodes/05-workflow.md b/episodes/05-workflow.md new file mode 100644 index 0000000..81e66f8 --- /dev/null +++ b/episodes/05-workflow.md @@ -0,0 +1,160 @@ +--- +title: "Set up workflow" +teaching: 10 +exercises: 5 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- How to set up Argo Workflow engine? +- How to submit a test job? +- Where to find the output? + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: objectives + +- Deploy Argo Workflows services to the cluster. +- Submit a test job. +- Find the output in your bucket. + +:::::::::::::::::::::::::::::::::::::::::::::::: + + +## Prerequisites + + +### GCP account and project + +Make sure that you are in the GCP account and project that you intend to use for this work. In your Linux terminal, type + +```bash +gcloud config list +``` + +The output shows your account and project. + +### Bucket + +If you worked through [Section 02](episodes/02-storage), you have now a storage bucket for the output files. + +List the buckets with + +```bash +gcloud storage ls +``` + +### Argo CLI + +You should have Argo CLI installed, see [Software setup](index.html#software-setup). + + +## Get the code + +The example Terraform scripts and Argo Workflow configuration are in + +Get them with + +```bash +git clone git@github.com:cms-dpoa/cloud-processing.git +cd cloud-processing/standard-gke-cluster-gcs-imgdisk +``` + +## Deploy Argo Workflows service + +Deploy Argo Workflows services with + +```bash +kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/download/v3.5.10/install.yaml +kubectl apply -f argo/service_account.yaml +kubectl apply -f argo/argo_role.yaml +kubectl apply -f argo/argo_role_binding.yaml +``` + +Wait for the services to start. + +You should see the following: + +```bash +$ kubectl get all -n argo +NAME READY STATUS RESTARTS AGE +pod/argo-server-5f7b589d6f-jkf4z 1/1 Running 0 24s +pod/workflow-controller-864c88655d-wsfr8 1/1 Running 0 24s + +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +service/argo-server ClusterIP 34.118.233.69 2746/TCP 25s + +NAME READY UP-TO-DATE AVAILABLE AGE +deployment.apps/argo-server 1/1 1 1 24s +deployment.apps/workflow-controller 1/1 1 1 24s + +NAME DESIRED CURRENT READY AGE +replicaset.apps/argo-server-5f7b589d6f 1 1 1 24s +replicaset.apps/workflow-controller-864c88655d 1 1 1 24s +``` + +## Submit a test job + +Edit the parameters in the `argo/argo_bucket_run.yaml` so that they are + +``` + parameters: + - name: nEvents + #FIXME + # Number of events in the dataset to be processed (-1 is all) + value: 1000 + - name: recid + #FIXME + # Record id of the dataset to be processed + value: 30511 + - name: nJobs + #FIXME + # Number of jobs the processing workflow should be split into + value: 2 + - name: bucket + #FIXME + # Name of cloud storage bucket for storing outputs + value: +``` + +Now submit the workflow with + +```bash +argo submit -n argo argo/argo_bucket_run.yaml +``` + +Observe its progress with + +```bash +argo get -n argo @latest +``` + +Once done, check the ouput in the bucket with + +```bash +$ gcloud storage ls gs:///** +gs:///pfnano/30511/files_30511.txt +gs:///pfnano/30511/logs/1.logs +gs:///pfnano/30511/logs/2.logs +gs:///pfnano/30511/plots/h_num_cands.png +gs:///pfnano/30511/plots/h_pdgid_cands.png +gs:///pfnano/30511/scatter/pfnanooutput1.root +gs:///pfnano/30511/scatter/pfnanooutput2.root +``` + +## Costs + + + + + + +::::::::::::::::::::::::::::::::::::: keypoints + +- Once the cluster is up, you will first deploy the Argo Workflows services using `kubectl`. +- You will submit and monitor the workflow with `argo`. +- You can see the output in the bucket with `gcloud` commands or on Google Cloud Console Web UI. + + +:::::::::::::::::::::::::::::::::::::::::::::::: +