Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in webhook certificate renewal with cert-manager self-signed issuer without a dedicated CA certificate #4019

Open
frittentheke opened this issue Jan 15, 2025 · 0 comments
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. triage/needs-investigation

Comments

@frittentheke
Copy link

Describe the bug
The Helm chart allows to use the cert-manager to create and manage the certificate used to serve the webhook endpoints (https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/2ff2e59711c8e749c93b46c62dd60975598115c3/helm/aws-load-balancer-controller/templates/webhook.yaml#L225C1-L250C11). To issue these certificates a selfSigned issuer is used (

).

This causes cert-manager to actually self-sign the generated certificates and not use a dedicated CA certificate. With the default cert lifetime of 60 days and a resulting renewal every 30 days the "CA" is also replaced with each renewal.

While the ALC does simply notice the cert file being updated and reloads, there also is the Kubernetes API and the mutatingwebhookconfiguration and validatingwebhookconfiguration named aws-load-balancer-webhook which get their caBundle injected by the ca-injector from cert-manager.

This process is independent from the update of the certificate and is therefore racy and causes the CA to not match for some time until both mechanisms have converged. This results in webhook invocations to fail:

  • 2025/01/15 08:26:55 http: TLS handshake error from 127.0.0.1:12345: remote error: tls: bad certificate

When NOT using cert-manager a CA certificate with 10-year lifetime is created, see

{{- $cert := genSignedCert (include "aws-load-balancer-controller.fullname" .) nil $altNames 3650 $ca -}}

The same approach can and should also be done for the cert-manager approach. See NGINX Ingress Controller helm chart for how they do exactly that but also having cert-manager issue a dedicated CA cert, see https://github.com/kubernetes/ingress-nginx/blob/8111b07adbe4ade4aba96bd52457b05fc737628f/charts/ingress-nginx/templates/admission-webhooks/cert-manager.yaml#L3-L28

Steps to reproduce

  • Use cert-manager to manage webhook certificates
  • Trigger certificate renewals while actively making requests to the webhooks (e.g. by scheduling pods)

Expected outcome
A concise description of what you expected to happen.

Environment

  • AWS Load Balancer controller version: 2.10.1
  • Kubernetes version: 1.30.x
  • Using EKS: yes

Additional Context:

@shraddhabang shraddhabang added kind/bug Categorizes issue or PR as related to a bug. triage/needs-investigation labels Jan 15, 2025
@shraddhabang shraddhabang added the good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. label Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. triage/needs-investigation
Projects
None yet
Development

No branches or pull requests

2 participants