Race condition in webhook certificate renewal with cert-manager self-signed issuer without a dedicated CA certificate #4019
Labels
good first issue
Denotes an issue ready for a new contributor, according to the "help wanted" guidelines.
kind/bug
Categorizes issue or PR as related to a bug.
triage/needs-investigation
Describe the bug
The Helm chart allows to use the cert-manager to create and manage the certificate used to serve the webhook endpoints (https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/2ff2e59711c8e749c93b46c62dd60975598115c3/helm/aws-load-balancer-controller/templates/webhook.yaml#L225C1-L250C11). To issue these certificates a
selfSigned
issuer is used (aws-load-balancer-controller/helm/aws-load-balancer-controller/templates/webhook.yaml
Line 249 in 2ff2e59
This causes cert-manager to actually self-sign the generated certificates and not use a dedicated CA certificate. With the default cert lifetime of 60 days and a resulting renewal every 30 days the "CA" is also replaced with each renewal.
While the ALC does simply notice the cert file being updated and reloads, there also is the Kubernetes API and the
mutatingwebhookconfiguration
andvalidatingwebhookconfiguration
namedaws-load-balancer-webhook
which get theircaBundle
injected by the ca-injector from cert-manager.This process is independent from the update of the certificate and is therefore racy and causes the CA to not match for some time until both mechanisms have converged. This results in webhook invocations to fail:
2025/01/15 08:26:55 http: TLS handshake error from 127.0.0.1:12345: remote error: tls: bad certificate
When NOT using cert-manager a CA certificate with 10-year lifetime is created, see
aws-load-balancer-controller/helm/aws-load-balancer-controller/templates/_helpers.tpl
Line 112 in 2ff2e59
The same approach can and should also be done for the cert-manager approach. See NGINX Ingress Controller helm chart for how they do exactly that but also having cert-manager issue a dedicated CA cert, see https://github.com/kubernetes/ingress-nginx/blob/8111b07adbe4ade4aba96bd52457b05fc737628f/charts/ingress-nginx/templates/admission-webhooks/cert-manager.yaml#L3-L28
Steps to reproduce
Expected outcome
A concise description of what you expected to happen.
Environment
Additional Context:
The text was updated successfully, but these errors were encountered: