Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconciler error due to unavailable secrets #1113

Open
timbuchwaldt opened this issue Apr 5, 2022 · 2 comments
Open

Reconciler error due to unavailable secrets #1113

timbuchwaldt opened this issue Apr 5, 2022 · 2 comments
Labels
🚀 enhancement New feature or request

Comments

@timbuchwaldt
Copy link

timbuchwaldt commented Apr 5, 2022

What steps did you take and what happened:

We are running a pretty default starboard but continuously see errors triggered by the unavailability of secrets in our logs.
This seems to only happen for our gitlab CI jobs that are very shortlived, secrets are being auto-deleted accordingly.

What did you expect to happen:

No errors / no repeated errors on short-lived pods / deleted secrets.

Anything else you would like to add:

We see the following log repeat:

{
   "level":"error",
   "ts":1649151260.958836,
   "logger":"controller.pod",
   "msg":"Reconciler error",
   "reconciler group":"",
   "reconciler kind":"Pod",
   "name":"runner-my-secret-runner-123",
   "namespace":"gitlab-runner-legacy",
   "error":"getting secret by name: gitlab-runnerrunner-my-secret-runner-123: Secret \"runner-my-secret-runner-123-1234\" not found",
   "stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"
}

Environment:

  • Starboard version (use starboard version): 0.15.1
  • Kubernetes version (use kubectl version): 1.22.5
  • OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc): Ubuntu 20.04 (server)
@danielpacak
Copy link
Contributor

danielpacak commented Apr 5, 2022

Thank you for the feedback @timbuchwaldt I think this might be related to #808 , which makes me thinking about few solutions:

  1. As suggested in Reconciler fast workload issue. #808, we could check if a Job runs for some time, but still chances are it will get deleted right after we checked its age.
  2. Don't scan Jobs at all, but if you have long running Jobs you probably want to check them for vulnerabilities anyway.
  3. Add exclusion logic based on label selectors:
    • In 0.15 we added new environment variable to exclude certain namespaces, i.e. OPERATOR_EXCLUDE_NAMESPACES, but maybe we need more granularity to exclude GitLab Jobs and similar workloads.
    • Exclude workloads from scanning #670 is where we discussed similar ideas.
    • @timbuchwaldt are GitLab Jobs created in a specific namespace or they might be created in any namespace?

Please let us know if you have any other ideas!

@danielpacak danielpacak added the 🚀 enhancement New feature or request label Apr 5, 2022
@timbuchwaldt
Copy link
Author

Oh yeah that sounds exactly the same, yes.

  1. Yeah that is for sure not as stable, but could be better.
  2. I think I'd want some jobs scanned, altough most are short-lived, too
  3. Excluded namespaces sounds feasible for now, yeah! Those live in very specific namespaces without other things.

In general a more lenient failure-handling seems approriate to me, generally all pods could die before the scans are done/in between, so I think the operator should stop retrying after some time or if the pod or secrets or the like are gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🚀 enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants