generated from cms-opendata-workshop/workbench-template-md
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
235 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -65,8 +65,9 @@ contact: '[email protected]' # FIXME | |
episodes: | ||
- 01-intro.md | ||
- 02-storage.md | ||
- 03-disk-image-manual.md | ||
- 03-disk-image.md | ||
- 04-cluster.md | ||
- 05-workflow.md | ||
|
||
# Information for Learners | ||
learners: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,12 +37,72 @@ The output shows your account and project. | |
|
||
Before you can create resources on GCP, you will need to enable them | ||
|
||
If this is your first project or you created it from the Google Cloud Console Web UI, it will have several services enabled. | ||
In addition to what was enabled in the previous section, we will now enable Kubernetes Engine API (container.googleapis.com): | ||
|
||
```bash | ||
gcloud services enable container.googleapis.com | ||
``` | ||
|
||
### Bucket | ||
|
||
If you worked through Section 02, you have now a storage bucket for the output files. | ||
|
||
List the buckets with | ||
|
||
```bash | ||
gcloud storage ls | ||
``` | ||
|
||
### Secondary disk | ||
|
||
If you worked through Section 03, you have a secondary boot disk image available | ||
|
||
## Get the code | ||
|
||
The example Terraform scripts and Argo Workflow configuration are in | ||
|
||
Get them with | ||
|
||
```bash | ||
git clone [email protected]:cms-dpoa/cloud-processing.git | ||
cd cloud-processing/standard-gke-cluster-gcs-imgdisk | ||
``` | ||
|
||
## Create the cluster | ||
|
||
Set the variable in the `terraform.tfvars` files. | ||
|
||
Run | ||
|
||
```bash | ||
terraform apply | ||
``` | ||
|
||
and confirm "yes". | ||
|
||
## Connect to the cluster and inspect | ||
|
||
```bash | ||
gcloud container clusters get-credentials cluster-2 --region europe-we | ||
st4-a --project hip-new-full-account | ||
``` | ||
|
||
```bash | ||
kubectl get nodes | ||
``` | ||
|
||
```bash | ||
kubectl get ns | ||
``` | ||
|
||
## Enable image streaming | ||
|
||
```bash | ||
gcloud container clusters update cluster-2 --zone europe-west4-a --ena | ||
ble-image-streaming | ||
|
||
``` | ||
|
||
|
||
## Costs | ||
|
||
|
@@ -53,9 +113,8 @@ If this is your first project or you created it from the Google Cloud Console We | |
|
||
::::::::::::::::::::::::::::::::::::: keypoints | ||
|
||
- Google Cloud Storage bucket can be used to store the output files. | ||
- The storage cost depends on the volume stored and for this type of processing is very small. | ||
- The download of the output files for the bucket has a signicant cost. | ||
- Kubernetes clusters can be created with Terraform scripts. | ||
- kubectl is the tool to interact with the cluster. | ||
|
||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,160 @@ | ||
--- | ||
title: "Set up workflow" | ||
teaching: 10 | ||
exercises: 5 | ||
--- | ||
|
||
:::::::::::::::::::::::::::::::::::::: questions | ||
|
||
- How to set up Argo Workflow engine? | ||
- How to submit a test job? | ||
- Where to find the output? | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::: objectives | ||
|
||
- Deploy Argo Workflows services to the cluster. | ||
- Submit a test job. | ||
- Find the output in your bucket. | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
|
||
## Prerequisites | ||
|
||
|
||
### GCP account and project | ||
|
||
Make sure that you are in the GCP account and project that you intend to use for this work. In your Linux terminal, type | ||
|
||
```bash | ||
gcloud config list | ||
``` | ||
|
||
The output shows your account and project. | ||
|
||
### Bucket | ||
|
||
If you worked through [Section 02](episodes/02-storage), you have now a storage bucket for the output files. | ||
|
||
List the buckets with | ||
|
||
```bash | ||
gcloud storage ls | ||
``` | ||
|
||
### Argo CLI | ||
|
||
You should have Argo CLI installed, see [Software setup](index.html#software-setup). | ||
|
||
|
||
## Get the code | ||
|
||
The example Terraform scripts and Argo Workflow configuration are in | ||
|
||
Get them with | ||
|
||
```bash | ||
git clone [email protected]:cms-dpoa/cloud-processing.git | ||
cd cloud-processing/standard-gke-cluster-gcs-imgdisk | ||
``` | ||
|
||
## Deploy Argo Workflows service | ||
|
||
Deploy Argo Workflows services with | ||
|
||
```bash | ||
kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/download/v3.5.10/install.yaml | ||
kubectl apply -f argo/service_account.yaml | ||
kubectl apply -f argo/argo_role.yaml | ||
kubectl apply -f argo/argo_role_binding.yaml | ||
``` | ||
|
||
Wait for the services to start. | ||
|
||
You should see the following: | ||
|
||
```bash | ||
$ kubectl get all -n argo | ||
NAME READY STATUS RESTARTS AGE | ||
pod/argo-server-5f7b589d6f-jkf4z 1/1 Running 0 24s | ||
pod/workflow-controller-864c88655d-wsfr8 1/1 Running 0 24s | ||
|
||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE | ||
service/argo-server ClusterIP 34.118.233.69 <none> 2746/TCP 25s | ||
|
||
NAME READY UP-TO-DATE AVAILABLE AGE | ||
deployment.apps/argo-server 1/1 1 1 24s | ||
deployment.apps/workflow-controller 1/1 1 1 24s | ||
|
||
NAME DESIRED CURRENT READY AGE | ||
replicaset.apps/argo-server-5f7b589d6f 1 1 1 24s | ||
replicaset.apps/workflow-controller-864c88655d 1 1 1 24s | ||
``` | ||
|
||
## Submit a test job | ||
|
||
Edit the parameters in the `argo/argo_bucket_run.yaml` so that they are | ||
|
||
``` | ||
parameters: | ||
- name: nEvents | ||
#FIXME | ||
# Number of events in the dataset to be processed (-1 is all) | ||
value: 1000 | ||
- name: recid | ||
#FIXME | ||
# Record id of the dataset to be processed | ||
value: 30511 | ||
- name: nJobs | ||
#FIXME | ||
# Number of jobs the processing workflow should be split into | ||
value: 2 | ||
- name: bucket | ||
#FIXME | ||
# Name of cloud storage bucket for storing outputs | ||
value: <YOUR_BUCKET_NAME> | ||
``` | ||
|
||
Now submit the workflow with | ||
|
||
```bash | ||
argo submit -n argo argo/argo_bucket_run.yaml | ||
``` | ||
|
||
Observe its progress with | ||
|
||
```bash | ||
argo get -n argo @latest | ||
``` | ||
|
||
Once done, check the ouput in the bucket with | ||
|
||
```bash | ||
$ gcloud storage ls gs://<YOUR_BUCKET_NAME>/** | ||
gs://<YOUR_BUCKET_NAME>/pfnano/30511/files_30511.txt | ||
gs://<YOUR_BUCKET_NAME>/pfnano/30511/logs/1.logs | ||
gs://<YOUR_BUCKET_NAME>/pfnano/30511/logs/2.logs | ||
gs://<YOUR_BUCKET_NAME>/pfnano/30511/plots/h_num_cands.png | ||
gs://<YOUR_BUCKET_NAME>/pfnano/30511/plots/h_pdgid_cands.png | ||
gs://<YOUR_BUCKET_NAME>/pfnano/30511/scatter/pfnanooutput1.root | ||
gs://<YOUR_BUCKET_NAME>/pfnano/30511/scatter/pfnanooutput2.root | ||
``` | ||
|
||
## Costs | ||
|
||
|
||
|
||
|
||
|
||
|
||
::::::::::::::::::::::::::::::::::::: keypoints | ||
|
||
- Once the cluster is up, you will first deploy the Argo Workflows services using `kubectl`. | ||
- You will submit and monitor the workflow with `argo`. | ||
- You can see the output in the bucket with `gcloud` commands or on Google Cloud Console Web UI. | ||
|
||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|