-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error handling on upload failures #216
Comments
Background: Currently permission failure will wait for >1.5h before showing any sign of what permission was missing. Shortening that interval and possibly showing a warning early would be nice. |
Also related: #167 |
Reading up on the calls used here, the aws-sdk-rust client is also doing retries internally. From the docs:
So inside the loop where In addition to recognizing non-retryable errors, |
Coldsnap has built in retries with increasing backoff delays when uploading blocks. It can be hard to tell what is happening during this time since there is no output while the retries are happening.
https://github.com/awslabs/coldsnap/blob/develop/src/upload.rs#L171-L188
It might be useful to add a
--verbose
flag to the command to be able to get a little more insight into what is going on. Or just default to emit some sort of warning message that a retry is happening.The number of times retries happen also seems to be a little too high.
SNAPSHOT_BLOCK_ATTEMPTS
is current set to 12. It seems likely that if the upload does not succeed after 3-5 attempts, it's not going to.It would also be good if coldsnap recognized some failures that are not worth retrying as they are not transient failures. Things like
AccessDeniedException
as @grosser encountered in bottlerocket-os/bottlerocket#2667 should just immediately fail:The text was updated successfully, but these errors were encountered: