Running on Azure

aiidalab · Feb 13, 2024 · e7f4c74 · e7f4c74
1 parent e7f2e50
commit e7f4c74
Show file tree

Hide file tree

Showing 17 changed files with 788 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,6 @@
+ssh-key-*
+secrets*
+k8s-deploy-venv/*
+values.yaml
+*/Chart.lock
+*/charts/*
diff --git a/README.md b/README.md
@@ -1 +1,201 @@
 # aiidalab-demo-server
+
+The instructions are adapted from [z2jh documentation for Azure deployment](https://z2jh.jupyter.org/en/stable/kubernetes/microsoft/step-zero-azure.html).
+
+## Pre-requisites
+
+Install the azure-cli and login to your account.
+
+```bash
+curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
+az login
+```
+
+You’ll need to open a browser and follow the instructions in your terminal to log in.
+
+Consider setting a [cloud budget](https://learn.microsoft.com/en-us/partner-center/set-an-azure-spending-budget-for-your-customers) for your Azure account.
+This can only be done by the account owner. It is not yet applied.
+
+Generate an SSH key pair if you don't have one already.
+
+```bash
+ssh-keygen -f ssh-key-aiidalab-demo-server
+```
+
+## Create an auto-scaling Kubernetes cluster
+
+```bash
+az group create --name aiidalab_demo_server_marvel --location=switzerlandnorth --output table
+```
+
+- `aiidalab_demo_server_marvel` is the name of the resource group.
+
+Create networkpolicy for the pods to communicate with each other and to the internet.
+
+```bash
+az network vnet create \
+   --resource-group aiidalab_demo_server_marvel \
+   --name aiidalab-vnet \
+   --address-prefixes 10.0.0.0/8 \
+   --subnet-name aiidalab-subnet \
+   --subnet-prefix 10.240.0.0/16
+```
+
+We will now retrieve the application IDs of the VNet and subnet we just created and save them to bash variables.
+
+```bash
+VNET_ID=$(az network vnet show \
+   --resource-group aiidalab_demo_server_marvel \
+   --name aiidalab-vnet \
+   --query id \
+   --output tsv)
+SUBNET_ID=$(az network vnet subnet show \
+   --resource-group aiidalab_demo_server_marvel \
+   --vnet-name aiidalab-vnet \
+   --name aiidalab-subnet \
+   --query id \
+   --output tsv)
+```
+
+Create an Azure Active Directory (Azure AD) service principal for use with the cluster, and assign the Contributor role for use with the VNet.
+
+```bash
+SP_PASSWD=$(az ad sp create-for-rbac \
+   --name aiidalab-sp \
+   --role Contributor \
+   --scopes $VNET_ID \
+   --query password \
+   --output tsv)
+SP_ID=$(az ad app list \
+   --filter "displayname eq 'aiidalab-sp'" \
+   --query [0].appId \
+   --output tsv)
+```
+
+Time to create the Kubernetes cluster, and enable the auto-scaler at the same time.
+
+```bash
+az aks create \
+   --name demo-server \
+   --resource-group aiidalab_demo_server_marvel \
+   --ssh-key-value ssh-key-aiidalab-demo-server.pub \
+   --node-count 3 \
+   --node-vm-size Standard_D2s_v3 \
+   --service-principal $SP_ID \
+   --client-secret $SP_PASSWD \
+   --dns-service-ip 10.0.0.10 \
+   --network-plugin azure \
+   --network-policy azure \
+   --service-cidr 10.0.0.0/16 \
+   --vnet-subnet-id $SUBNET_ID \
+   --vm-set-type VirtualMachineScaleSets \
+   --enable-cluster-autoscaler \
+   --min-count 3 \
+   --max-count 6 \
+   --output table
+```
+
+The auto-scaler will scale the number of nodes in the cluster between 3 and 6, based on the CPU and memory usage of the pods.
+It can be updated later with the following command:
+
+```bash
+az aks update \
+   --name demo-server \
+   --resource-group aiidalab_demo_server_marvel \
+   --update-cluster-autoscaler \
+   --min-count <DESIRED-MINIMUM-COUNT> \
+   --max-count <DESIRED-MAXIMUM-COUNT> \
+   --output table
+```
+
+
+
+### Customizing the auto-scaler
+
+The auto-scaler can be customized to scale based on different metrics, such as CPU or memory usage.
+Go to the [Azure portal](https://portal.azure.com/) and navigate to the Kubernetes cluster.
+Under the "Resource" section, select the `VMSS`, and then "Custom autoscale".
+These are two rules applied to the VMSS:
+
+- Increase the instance count by 1 when the average CPU usage over 10 minutes is greater than 80%
+- Decrease the instance count by 1 when the average CPU usage over 10 minutes is less than 5%
+
+## Install kubectl and Helm
+
+The above setup in general is done once.
+But make sure the [Pre-requisites](#pre-requisites) are done before proceeding, to have `az` command available.
+If the cluster is already created, the ssh-key can be update to the cluster with the following command:
+
+```bash
+az aks update \
+   --name demo-server \
+   --resource-group aiidalab_demo_server_marvel \
+   --ssh-key-value <your-pub-ssh-key>.pub
+```
+
+The command will update the key on all node pools.
+
+The following steps are for administrators/maintainers of the cluster to configure in their local machines.
+
+If you’re using the Azure CLI locally, install kubectl, a tool for accessing the Kubernetes API from the commandline:
+You may need sudo to install the commands to `/usr/local/bin`.
+
+```bash
+az aks install-cli
+```
+
+Get credentials from Azure for kubectl to work:
+
+```bash
+az aks get-credentials \
+   --name demo-server \
+   --resource-group aiidalab_demo_server_marvel \
+   --output table
+```
+
+This will update the `~/.kube/config` file with the credentials for the Kubernetes cluster.
+
+Now the nodes are ready to be used.
+You can check the status of the nodes with the following command:
+
+```bash
+kubectl get nodes
+```
+
+Helm is a package manager for Kubernetes, and it is used to install JupyterHub.
+
+```bash
+curl https://raw.githubusercontent.com/helm/helm/HEAD/scripts/get-helm-3 | bash
+```
+
+## Install JupyterHub
+
+Running the helm command will install JupyterHub with the configuration in `values.yaml`.
+Before running the command, make sure the `values.yaml` file is updated with the correct configuration set and read from jinja2 template.
+
+```bash
+## Create a python environment for the deployment
+python3 -m venv k8s-deploy-venv
+source k8s-deploy-venv/bin/activate
+
+## Install the requirements
+python3 -m pip install -r requirements.txt
+```
+
+The following environment variables are required to be set:
+
+* `K8S_NAMESPACE`: The namespace where the JupyterHub will be installed, e.g. `production`, `staging`.
+* `GITHUB_CLIENT_ID`: The client ID of the GitHub app.
+* `GITHUB_CLIENT_SECRET`: The client secret of the GitHub app.
+* `OAUTH_CALLBACK_URL`: The callback URL of the GitHub app.
+
+We use GitHub oauthenticator, the users will be able to login with their GitHub account.
+The authentication is created using the `aiidalab-bot` user with app name `aiidalab-demo-server`.
+
+To deploy the JupyterHub, run the following command:
+
+```bash
+./deploy.sh
+```
+
+If the namespace does not exist, it will be created.
diff --git a/basehub/Chart.yaml b/basehub/Chart.yaml
@@ -0,0 +1,9 @@
+---
+apiVersion: v2
+description: Deployment Chart for JupyterHub
+name: basehub
+version: 0.1.0
+dependencies:
+    - name: jupyterhub
+      version: 3.1.0
+      repository: https://jupyterhub.github.io/helm-chart/
diff --git a/basehub/files/etc/jupyterhub/jupyter_notebook_config.py b/basehub/files/etc/jupyterhub/jupyter_notebook_config.py
@@ -0,0 +1,17 @@
+c.NotebookApp.extra_template_paths.append('/etc/jupyterhub/templates')
+
+# cull_idle_timeout: timeout (in seconds) after which an idle kernel is
+# considered ready to be culled
+c.MappingKernelManager.cull_idle_timeout = 1200 # default: 0
+
+# cull_interval: the interval (in seconds) on which to check for idle
+# kernels exceeding the cull timeout value
+c.MappingKernelManager.cull_interval = 120 # default: 300
+
+# cull_connected: whether to consider culling kernels which have one
+# or more connections
+c.MappingKernelManager.cull_connected = True # default: false
+
+# cull_busy: whether to consider culling kernels which are currently
+# busy running some code
+c.MappingKernelManager.cull_busy = False # default: false
diff --git a/basehub/files/etc/jupyterhub/templates/README.md b/basehub/files/etc/jupyterhub/templates/README.md
@@ -0,0 +1,3 @@
+# Templates
+
+The templates are adapted from [LibreTexts/jupyterhub-templates](https://github.com/LibreTexts/jupyterhub-templates).
diff --git a/basehub/files/etc/jupyterhub/templates/about.html b/basehub/files/etc/jupyterhub/templates/about.html
@@ -0,0 +1,10 @@
+<html>
+    <head>
+        <title>About Page Redirect</title>
+     <meta charset="UTF-8" />
+     <meta http-equiv="refresh" content="0; URL=https://www.aiidalab.net/about/" />
+   </head>
+   <body>
+     <p>This page not yet specified to PSI AiiDAlab deployment so will redirect to the generic AiiDAlab about page. If you are not redirected, click <a href="https://www.aiidalab.net/about/">here</a> to go to the page.</p>
+   </body>
+</html>
diff --git a/basehub/files/etc/jupyterhub/templates/faq.html b/basehub/files/etc/jupyterhub/templates/faq.html
@@ -0,0 +1,10 @@
+<html>
+    <head>
+        <title>FAQ Page Redirect</title>
+     <meta charset="UTF-8" />
+     <meta http-equiv="refresh" content="0; URL=https://aiidalab.readthedocs.io/en/latest/usage/index.html#aiidalab-home-page" />
+   </head>
+   <body>
+     <p>This page not yet specified to PSI AiiDAlab deployment so will redirect to the generic AiiDAlab documentation page. If you are not redirected, click <a href="https://aiidalab.readthedocs.io/en/latest/usage/index.html#aiidalab-home-page">here</a> to go to the page.</p>
+   </body>
+</html>
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# Templates

		The templates are adapted from [LibreTexts/jupyterhub-templates](https://github.com/LibreTexts/jupyterhub-templates).