Skip to content

Commit

Permalink
05: add workflow
Browse files Browse the repository at this point in the history
  • Loading branch information
katilp committed Oct 16, 2024
1 parent 8a45951 commit c7c1182
Show file tree
Hide file tree
Showing 5 changed files with 235 additions and 6 deletions.
3 changes: 2 additions & 1 deletion config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,9 @@ contact: '[email protected]' # FIXME
episodes:
- 01-intro.md
- 02-storage.md
- 03-disk-image-manual.md
- 03-disk-image.md
- 04-cluster.md
- 05-workflow.md

# Information for Learners
learners:
Expand Down
7 changes: 7 additions & 0 deletions episodes/03-disk-image-manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,14 @@ gcloud compute instances delete vm-work --zone=europe-west4-c
gcloud compute disks delete pfnano-disk --zone=europe-west4-c
```
## Nota bene
The workflow does not find the container images in this secondary disk on the node.
A visible difference between the images that the "family" parameter is `secondary-disk-image` for the one built by the disk.
TBC if setting it so helps.
But it might be more involved...
## Costs
Expand Down
4 changes: 3 additions & 1 deletion episodes/03-disk-image.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,9 @@ Message: Quota 'N2_CPUS' exceeded.

are due to requested machine type no being available in the requested zone. Nothing to do with you quota.

Try in a different region or with a different machine type. You can give them as parameters e.g. `--zone=europe-west4-a --machine-type=e2-standard-4`
Try in a different region or with a different machine type. You can give them as parameters e.g. `--zone=europe-west4-a --machine-type=e2-standard-4`.
Independent of the zone specified in parameters, the disk image will have `eu` as the location, so any zone in `europe` is OK (if you plan to create your cluster in a zone in `europe`).


Note that the bucket for logs has to be in the same region so you might need to create another one. Remove the old one with `gcloud storage rm -r gs://<BUCKET_FOR_LOGS>`.

Expand Down
67 changes: 63 additions & 4 deletions episodes/04-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,72 @@ The output shows your account and project.

Before you can create resources on GCP, you will need to enable them

If this is your first project or you created it from the Google Cloud Console Web UI, it will have several services enabled.
In addition to what was enabled in the previous section, we will now enable Kubernetes Engine API (container.googleapis.com):

```bash
gcloud services enable container.googleapis.com
```

### Bucket

If you worked through Section 02, you have now a storage bucket for the output files.

List the buckets with

```bash
gcloud storage ls
```

### Secondary disk

If you worked through Section 03, you have a secondary boot disk image available

## Get the code

The example Terraform scripts and Argo Workflow configuration are in

Get them with

```bash
git clone [email protected]:cms-dpoa/cloud-processing.git
cd cloud-processing/standard-gke-cluster-gcs-imgdisk
```

## Create the cluster

Set the variable in the `terraform.tfvars` files.

Run

```bash
terraform apply
```

and confirm "yes".

## Connect to the cluster and inspect

```bash
gcloud container clusters get-credentials cluster-2 --region europe-we
st4-a --project hip-new-full-account
```

```bash
kubectl get nodes
```

```bash
kubectl get ns
```

## Enable image streaming

```bash
gcloud container clusters update cluster-2 --zone europe-west4-a --ena
ble-image-streaming

```


## Costs

Expand All @@ -53,9 +113,8 @@ If this is your first project or you created it from the Google Cloud Console We

::::::::::::::::::::::::::::::::::::: keypoints

- Google Cloud Storage bucket can be used to store the output files.
- The storage cost depends on the volume stored and for this type of processing is very small.
- The download of the output files for the bucket has a signicant cost.
- Kubernetes clusters can be created with Terraform scripts.
- kubectl is the tool to interact with the cluster.


::::::::::::::::::::::::::::::::::::::::::::::::
Expand Down
160 changes: 160 additions & 0 deletions episodes/05-workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
---
title: "Set up workflow"
teaching: 10
exercises: 5
---

:::::::::::::::::::::::::::::::::::::: questions

- How to set up Argo Workflow engine?
- How to submit a test job?
- Where to find the output?

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives

- Deploy Argo Workflows services to the cluster.
- Submit a test job.
- Find the output in your bucket.

::::::::::::::::::::::::::::::::::::::::::::::::


## Prerequisites


### GCP account and project

Make sure that you are in the GCP account and project that you intend to use for this work. In your Linux terminal, type

```bash
gcloud config list
```

The output shows your account and project.

### Bucket

If you worked through [Section 02](episodes/02-storage), you have now a storage bucket for the output files.

List the buckets with

```bash
gcloud storage ls
```

### Argo CLI

You should have Argo CLI installed, see [Software setup](index.html#software-setup).


## Get the code

The example Terraform scripts and Argo Workflow configuration are in

Get them with

```bash
git clone [email protected]:cms-dpoa/cloud-processing.git
cd cloud-processing/standard-gke-cluster-gcs-imgdisk
```

## Deploy Argo Workflows service

Deploy Argo Workflows services with

```bash
kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/download/v3.5.10/install.yaml
kubectl apply -f argo/service_account.yaml
kubectl apply -f argo/argo_role.yaml
kubectl apply -f argo/argo_role_binding.yaml
```

Wait for the services to start.

You should see the following:

```bash
$ kubectl get all -n argo
NAME READY STATUS RESTARTS AGE
pod/argo-server-5f7b589d6f-jkf4z 1/1 Running 0 24s
pod/workflow-controller-864c88655d-wsfr8 1/1 Running 0 24s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/argo-server ClusterIP 34.118.233.69 <none> 2746/TCP 25s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/argo-server 1/1 1 1 24s
deployment.apps/workflow-controller 1/1 1 1 24s

NAME DESIRED CURRENT READY AGE
replicaset.apps/argo-server-5f7b589d6f 1 1 1 24s
replicaset.apps/workflow-controller-864c88655d 1 1 1 24s
```

## Submit a test job

Edit the parameters in the `argo/argo_bucket_run.yaml` so that they are

```
parameters:
- name: nEvents
#FIXME
# Number of events in the dataset to be processed (-1 is all)
value: 1000
- name: recid
#FIXME
# Record id of the dataset to be processed
value: 30511
- name: nJobs
#FIXME
# Number of jobs the processing workflow should be split into
value: 2
- name: bucket
#FIXME
# Name of cloud storage bucket for storing outputs
value: <YOUR_BUCKET_NAME>
```

Now submit the workflow with

```bash
argo submit -n argo argo/argo_bucket_run.yaml
```

Observe its progress with

```bash
argo get -n argo @latest
```

Once done, check the ouput in the bucket with

```bash
$ gcloud storage ls gs://<YOUR_BUCKET_NAME>/**
gs://<YOUR_BUCKET_NAME>/pfnano/30511/files_30511.txt
gs://<YOUR_BUCKET_NAME>/pfnano/30511/logs/1.logs
gs://<YOUR_BUCKET_NAME>/pfnano/30511/logs/2.logs
gs://<YOUR_BUCKET_NAME>/pfnano/30511/plots/h_num_cands.png
gs://<YOUR_BUCKET_NAME>/pfnano/30511/plots/h_pdgid_cands.png
gs://<YOUR_BUCKET_NAME>/pfnano/30511/scatter/pfnanooutput1.root
gs://<YOUR_BUCKET_NAME>/pfnano/30511/scatter/pfnanooutput2.root
```

## Costs






::::::::::::::::::::::::::::::::::::: keypoints

- Once the cluster is up, you will first deploy the Argo Workflows services using `kubectl`.
- You will submit and monitor the workflow with `argo`.
- You can see the output in the bucket with `gcloud` commands or on Google Cloud Console Web UI.


::::::::::::::::::::::::::::::::::::::::::::::::

0 comments on commit c7c1182

Please sign in to comment.