Skip to content

Commit

Permalink
Running on Azure
Browse files Browse the repository at this point in the history
  • Loading branch information
unkcpz committed Feb 13, 2024
1 parent e7f2e50 commit e7f4c74
Show file tree
Hide file tree
Showing 17 changed files with 788 additions and 0 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
ssh-key-*
secrets*
k8s-deploy-venv/*
values.yaml
*/Chart.lock
*/charts/*
200 changes: 200 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1 +1,201 @@
# aiidalab-demo-server

The instructions are adapted from [z2jh documentation for Azure deployment](https://z2jh.jupyter.org/en/stable/kubernetes/microsoft/step-zero-azure.html).

## Pre-requisites

Install the azure-cli and login to your account.

```bash
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
az login
```

You’ll need to open a browser and follow the instructions in your terminal to log in.

Consider setting a [cloud budget](https://learn.microsoft.com/en-us/partner-center/set-an-azure-spending-budget-for-your-customers) for your Azure account.
This can only be done by the account owner. It is not yet applied.

Generate an SSH key pair if you don't have one already.

```bash
ssh-keygen -f ssh-key-aiidalab-demo-server
```

## Create an auto-scaling Kubernetes cluster

```bash
az group create --name aiidalab_demo_server_marvel --location=switzerlandnorth --output table
```

- `aiidalab_demo_server_marvel` is the name of the resource group.

Create networkpolicy for the pods to communicate with each other and to the internet.

```bash
az network vnet create \
--resource-group aiidalab_demo_server_marvel \
--name aiidalab-vnet \
--address-prefixes 10.0.0.0/8 \
--subnet-name aiidalab-subnet \
--subnet-prefix 10.240.0.0/16
```

We will now retrieve the application IDs of the VNet and subnet we just created and save them to bash variables.

```bash
VNET_ID=$(az network vnet show \
--resource-group aiidalab_demo_server_marvel \
--name aiidalab-vnet \
--query id \
--output tsv)
SUBNET_ID=$(az network vnet subnet show \
--resource-group aiidalab_demo_server_marvel \
--vnet-name aiidalab-vnet \
--name aiidalab-subnet \
--query id \
--output tsv)
```

Create an Azure Active Directory (Azure AD) service principal for use with the cluster, and assign the Contributor role for use with the VNet.

```bash
SP_PASSWD=$(az ad sp create-for-rbac \
--name aiidalab-sp \
--role Contributor \
--scopes $VNET_ID \
--query password \
--output tsv)
SP_ID=$(az ad app list \
--filter "displayname eq 'aiidalab-sp'" \
--query [0].appId \
--output tsv)
```

Time to create the Kubernetes cluster, and enable the auto-scaler at the same time.

```bash
az aks create \
--name demo-server \
--resource-group aiidalab_demo_server_marvel \
--ssh-key-value ssh-key-aiidalab-demo-server.pub \
--node-count 3 \
--node-vm-size Standard_D2s_v3 \
--service-principal $SP_ID \
--client-secret $SP_PASSWD \
--dns-service-ip 10.0.0.10 \
--network-plugin azure \
--network-policy azure \
--service-cidr 10.0.0.0/16 \
--vnet-subnet-id $SUBNET_ID \
--vm-set-type VirtualMachineScaleSets \
--enable-cluster-autoscaler \
--min-count 3 \
--max-count 6 \
--output table
```

The auto-scaler will scale the number of nodes in the cluster between 3 and 6, based on the CPU and memory usage of the pods.
It can be updated later with the following command:

```bash
az aks update \
--name demo-server \
--resource-group aiidalab_demo_server_marvel \
--update-cluster-autoscaler \
--min-count <DESIRED-MINIMUM-COUNT> \
--max-count <DESIRED-MAXIMUM-COUNT> \
--output table
```



### Customizing the auto-scaler

The auto-scaler can be customized to scale based on different metrics, such as CPU or memory usage.
Go to the [Azure portal](https://portal.azure.com/) and navigate to the Kubernetes cluster.
Under the "Resource" section, select the `VMSS`, and then "Custom autoscale".
These are two rules applied to the VMSS:

- Increase the instance count by 1 when the average CPU usage over 10 minutes is greater than 80%
- Decrease the instance count by 1 when the average CPU usage over 10 minutes is less than 5%

## Install kubectl and Helm

The above setup in general is done once.
But make sure the [Pre-requisites](#pre-requisites) are done before proceeding, to have `az` command available.
If the cluster is already created, the ssh-key can be update to the cluster with the following command:

```bash
az aks update \
--name demo-server \
--resource-group aiidalab_demo_server_marvel \
--ssh-key-value <your-pub-ssh-key>.pub
```

The command will update the key on all node pools.

The following steps are for administrators/maintainers of the cluster to configure in their local machines.

If you’re using the Azure CLI locally, install kubectl, a tool for accessing the Kubernetes API from the commandline:
You may need sudo to install the commands to `/usr/local/bin`.

```bash
az aks install-cli
```

Get credentials from Azure for kubectl to work:

```bash
az aks get-credentials \
--name demo-server \
--resource-group aiidalab_demo_server_marvel \
--output table
```

This will update the `~/.kube/config` file with the credentials for the Kubernetes cluster.

Now the nodes are ready to be used.
You can check the status of the nodes with the following command:

```bash
kubectl get nodes
```

Helm is a package manager for Kubernetes, and it is used to install JupyterHub.

```bash
curl https://raw.githubusercontent.com/helm/helm/HEAD/scripts/get-helm-3 | bash
```

## Install JupyterHub

Running the helm command will install JupyterHub with the configuration in `values.yaml`.
Before running the command, make sure the `values.yaml` file is updated with the correct configuration set and read from jinja2 template.

```bash
## Create a python environment for the deployment
python3 -m venv k8s-deploy-venv
source k8s-deploy-venv/bin/activate

## Install the requirements
python3 -m pip install -r requirements.txt
```

The following environment variables are required to be set:

* `K8S_NAMESPACE`: The namespace where the JupyterHub will be installed, e.g. `production`, `staging`.
* `GITHUB_CLIENT_ID`: The client ID of the GitHub app.
* `GITHUB_CLIENT_SECRET`: The client secret of the GitHub app.
* `OAUTH_CALLBACK_URL`: The callback URL of the GitHub app.

We use GitHub oauthenticator, the users will be able to login with their GitHub account.
The authentication is created using the `aiidalab-bot` user with app name `aiidalab-demo-server`.

To deploy the JupyterHub, run the following command:

```bash
./deploy.sh
```

If the namespace does not exist, it will be created.
9 changes: 9 additions & 0 deletions basehub/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
apiVersion: v2
description: Deployment Chart for JupyterHub
name: basehub
version: 0.1.0
dependencies:
- name: jupyterhub
version: 3.1.0
repository: https://jupyterhub.github.io/helm-chart/
17 changes: 17 additions & 0 deletions basehub/files/etc/jupyterhub/jupyter_notebook_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
c.NotebookApp.extra_template_paths.append('/etc/jupyterhub/templates')

# cull_idle_timeout: timeout (in seconds) after which an idle kernel is
# considered ready to be culled
c.MappingKernelManager.cull_idle_timeout = 1200 # default: 0

# cull_interval: the interval (in seconds) on which to check for idle
# kernels exceeding the cull timeout value
c.MappingKernelManager.cull_interval = 120 # default: 300

# cull_connected: whether to consider culling kernels which have one
# or more connections
c.MappingKernelManager.cull_connected = True # default: false

# cull_busy: whether to consider culling kernels which are currently
# busy running some code
c.MappingKernelManager.cull_busy = False # default: false
3 changes: 3 additions & 0 deletions basehub/files/etc/jupyterhub/templates/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Templates

The templates are adapted from [LibreTexts/jupyterhub-templates](https://github.com/LibreTexts/jupyterhub-templates).
10 changes: 10 additions & 0 deletions basehub/files/etc/jupyterhub/templates/about.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<html>
<head>
<title>About Page Redirect</title>
<meta charset="UTF-8" />
<meta http-equiv="refresh" content="0; URL=https://www.aiidalab.net/about/" />
</head>
<body>
<p>This page not yet specified to PSI AiiDAlab deployment so will redirect to the generic AiiDAlab about page. If you are not redirected, click <a href="https://www.aiidalab.net/about/">here</a> to go to the page.</p>
</body>
</html>
10 changes: 10 additions & 0 deletions basehub/files/etc/jupyterhub/templates/faq.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<html>
<head>
<title>FAQ Page Redirect</title>
<meta charset="UTF-8" />
<meta http-equiv="refresh" content="0; URL=https://aiidalab.readthedocs.io/en/latest/usage/index.html#aiidalab-home-page" />
</head>
<body>
<p>This page not yet specified to PSI AiiDAlab deployment so will redirect to the generic AiiDAlab documentation page. If you are not redirected, click <a href="https://aiidalab.readthedocs.io/en/latest/usage/index.html#aiidalab-home-page">here</a> to go to the page.</p>
</body>
</html>
Loading

0 comments on commit e7f4c74

Please sign in to comment.