Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node install stalls because of large retry count #107

Open
ctrox opened this issue Feb 16, 2023 · 0 comments
Open

Node install stalls because of large retry count #107

ctrox opened this issue Feb 16, 2023 · 0 comments

Comments

@ctrox
Copy link
Contributor

ctrox commented Feb 16, 2023

On RKE2 we have observed that the machine-provision pod can sometimes be stuck for hours due to the very large retry count of 4500. This mainly seems to happen in retrieve_connection_info, which by the way does not exit 1 even once it is done with all the retries.

Regardless of the actual cause making retrieve_connection_info fail all the time, wouldn't it make sense to have a more reasonable RETRY_COUNT here? This would cause the provisioning to fail faster and retry by creating a whole new machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants