Up to: DataONE Cluster Overview
To reboot a node, first drain the node, reboot, then add the node back:
$ ssh [email protected]
metadig@docker-ucsb-4:~$ kubectl config use-context prod-k8s # Update ~metadig/.kube/config if this fails
metadig@docker-ucsb-4:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
docker-ucsb-4 Ready control-plane,master 2y97d v1.23.4
k8s-node-1 Ready <none> 362d v1.23.4
k8s-node-2 Ready <none> 2y96d v1.23.4
k8s-node-3 Ready <none> 2y96d v1.23.4
metadig@docker-ucsb-4:~$ kubectl drain k8s-node-1 --ignore-daemonsets --delete-emptydir-data --force
Reboot the drained node:
$ ssh k8s-node-1.dataone.org
outin@k8s-node-1:~$ sudo reboot
Add the node back:
kubectl uncordon k8s-node-1
No steps are necessary before rebooting the controller (currently k8s-ctrl-1).
All commands are run on the new K8s node unless specified.
-
Create a new VM using the NCEAS Server Setup Docs
-
Disable any swap files or partitions
sudo swapoff -a
sudo vim /etc/fstab
- Install the K8s deb repo from https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ :
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
- Install K8s (the same version as on the controllers) and dependencies:
sudo apt update
sudo apt install apt-transport-https ca-certificates curl software-properties-common docker.io kubeadm=1.23.4-00 kubectl=1.23.4-00 kubelet=1.23.4-00
sudo apt-mark hold kubeadm kubelet kubectl
- Print the join command from the controller:
k8s-ctrl$ sudo kubeadm token create --print-join-command
- Paste and run the join command on the new node:
kubeadm join ...
- Verify that the new node has joined successfully from the controller:
k8s-ctrl$ kubectl get nodes -o wide
k8s-ctrl$ kubectl get pods -A -o wide
- Remove the new node if something went wrong
k8s-ctrl$ kubectl drain k8s-node-new --ignore-daemonsets --delete-emptydir-data --force
k8s-ctrl$ kubectl cordon k8s-node-new
k8s-ctrl$ kubectl drain k8s-node-123 --ignore-daemonsets --delete-emptydir-data --force
k8s-ctrl$ kubectl delete node k8s-node-123
# Optional, run on the deleted node to reset K8s config
k8s-node-123$ kubeadm reset
Different nodes may have different resources and you may restrict a pod to run on particular node(s). In order to do so, you may first label a node.
The following command gives a node named k8s-dev-node-4
a label nceas/nodegroup
with the value fast
:
kubectl label nodes k8s-dev-node-4 nceas/nodegroup=fast
The bellow section of code may be added to the Values.yaml on the top level in your application. It will constrain the pod(s) to run on the nodes having the label nceas/nodegroup
with the value fast
. If the selector cannot select a node or nodes, the pod cannot be generated.
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nceas/nodegroup
operator: In
values:
- fast
- Check cert expiration:
kubeadm certs check-expiration
- Renew the certs:
kubeadm certs renew all
kubeadm certs check-expiration
- Reboot the controller
sudo reboot
- Update the config-dev/config-prod gpg file
- Copy the keys
cluster/certificate-authority-data
and user info for thedev-k8s
orprod-k8s
users (user/client-certificate-data
anduser/client-key-data
from/etc/kubernetes/admin.conf
into the appropriateconfig-dev
/config-prod
file- Note that the admin.conf uses a different username than we use in our client files (we use
dev-k8s
andprod-k8s
as the context usernames) - Be sure to leave the other contexts (including user accounts, namespaces, etc) in place, such as
dev-slinky
,dev-metadig
, etc.
- Note that the admin.conf uses a different username than we use in our client files (we use
- GPG encypt the modified
config-dev
/config-prod
file - Upload to https://github.nceas.ucsb.edu/NCEAS/security/tree/main/k8s
- Copy the keys