You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
logs are getting flooded as Drive is have endless Retry mechanism. WE can say spamming is happening on the failure command.
What you expected to happen:
logs should have a counter for retry when we see errors while bucket creation/access/grant.
As same error message is repeated day/night if failure is not fixed. User should have a control, how many time system should retry
How to reproduce this bug (as minimally and precisely as possible):
Induce an error while bucket any of the workflow creation/access/grant.
Look into the provisioner log, you will see same error message is repeated day/night if failure is not fixed.
If the issue remains for couple of days, then it will eat all memory and space of the system
The text was updated successfully, but these errors were encountered:
BlaineEXE
changed the title
[DATE] - [sidecar] logs are getting flooded as sidecar code is have endless Retry mechanism
[14 Jan 2025] - [sidecar] logs are getting flooded as sidecar code is have endless Retry mechanism
Jan 24, 2025
The sidecar is expected to be an operator of sorts, so I do expect it to retry until it's successful. That's part of the control theory that keeps Kuberentes and its systems stable.
That said, normally I also expect some sort of backoff mechanism. I would propose that the fix for this issue report focus on ensuring a reaosonable backoff. We can probably start with the backoff strategy and timing recommended by controller-runtime (reference needed) and readjust as needed in the future if this continues to come up.
One of my first thoughts when triaging/planning bugs is what sort of testing is needed. Regression tests are very important.
For bugs related to reconcile retry timing and logging, I have found that timing-related log-output expectations are hard to codify. In Rook, we tend to not create regression tests for these cases and instead just try to do our best to make sure system internals are logging helpful info without frequent spam.
@shanduur , my inclination here is to not require deeply involved unit/e2e tests, but I'm curious to your input here as well.
What happened:
logs are getting flooded as Drive is have endless Retry mechanism. WE can say spamming is happening on the failure command.
What you expected to happen:
logs should have a counter for retry when we see errors while bucket creation/access/grant.
As same error message is repeated day/night if failure is not fixed. User should have a control, how many time system should retry
How to reproduce this bug (as minimally and precisely as possible):
If the issue remains for couple of days, then it will eat all memory and space of the system
Issue is
The log handling for the COSI APIs is handled by the sidecar (https://github.com/kubernetes-sigs/container-object-storage-interface-provisioner-sidecar) where if an error is occured it endless goes on a retry mechanism. However, the sidecar does not currently stop retrying after some time. We should have a tweaking counter for the same
The text was updated successfully, but these errors were encountered: