Skip to content

Commit

Permalink
docs/Troubleshooting: document node-ip choice and pcibus scanning
Browse files Browse the repository at this point in the history
  • Loading branch information
Felix Wischke (65278) committed Apr 10, 2024
1 parent 18e3933 commit ad338e5
Showing 1 changed file with 37 additions and 0 deletions.
37 changes: 37 additions & 0 deletions docs/Troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,40 @@ If your capi-controller is too new, you can pass a `--core cluster-api:v1.6.1` d
## Calico fails in IPVS mode with loadBalancers to expose services
Calico unfortunately does not test connectivity when it choses a node ip to use for IPVS communication.
This can be altered manually. More on this topic in [Calicos documentation](https://docs.tigera.io/calico/latest/networking/ipam/ip-autodetection#autodetection-methods).

## Nodes fail to deploy/have wrong node-ip with mixed interface models
Kubelet chooses the first interface to acquire a node-ip for kubeadm. The first
interface is defined by the in kernel order, which is defined by the order the
pci bus is scanned and drivers are loaded.

As an example:
```
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
test-cluster-control-plane-gcgc6 Ready control-plane 11h v1.26.7 10.0.1.69 <none> Ubuntu 22.04.3 LTS 5.15.0-89-generic containerd://1.7.6
test-cluster-load-balancer-c8rd2 Ready load-balancer,node 11h v1.26.7 10.0.2.155 <none> Ubuntu 22.04.3 LTS 5.15.0-89-generic containerd://1.7.6
test-cluster-load-balancer-wqbcg Ready load-balancer,node 11h v1.26.7 10.0.2.152 <none> Ubuntu 22.04.3 LTS 5.15.0-89-generic containerd://1.7.6
test-cluster-worker-hbm8s Ready node 11h v1.26.7 10.0.1.71 <none> Ubuntu 22.04.3 LTS 5.15.0-89-generic containerd://1.7.6
test-cluster-worker-n2vbc NotReady node 17m v1.26.7 10.0.1.73 <none> Ubuntu 22.04.3 LTS 5.15.0-89-generic containerd://1.7.6
```

The load-balancers have an `e1000` interface as their default network, whereas `ens19` and `ens20` are `virtio`
```
root@test-cluster-load-balancer-zrjx8:~# ip -o l sh
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc prio state UP mode DEFAULT group default qlen 1000\ link/ether 0a:97:89:e5:7f:1d brd ff:ff:ff:ff:ff:ff\ altname enp0s19
3: ens20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc prio master vrf-ext state UP mode DEFAULT group default qlen 1000\ link/ether 9a:58:08:40:a2:70 brd ff:ff:ff:ff:ff:ff\ altname enp0s20
4: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc prio state UP mode DEFAULT group default qlen 1000\ link/ether 16:7a:ee:74:23:0d brd ff:ff:ff:ff:ff:ff\ altname enp0s18
```

This is the order the interfaces are created in:
```
root@test-cluster-load-balancer-zrjx8:~# dmesg -t | grep eth
virtio_net virtio2 ens19: renamed from eth0
virtio_net virtio3 ens20: renamed from eth1
e1000 0000:00:12.0 eth0: (PCI:33MHz:32-bit) 16:7a:ee:74:23:0d
e1000 0000:00:12.0 eth0: Intel(R) PRO/1000 Network Connection
e1000 0000:00:12.0 ens18: renamed from eth0
```

If you absolutely must mix interface types, make sure that the default network interface is the one that comes up first.

0 comments on commit ad338e5

Please sign in to comment.