Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent link and clustermachine deletion from getting stuck #106

Merged
merged 1 commit into from
Apr 3, 2024

Conversation

utkuozdemir
Copy link
Member

This PR contains fixes for the following issues:

  • When a Link resource was deleted, MachineCleanupController was unable to remove finalizers on the MachineSetNode resource, as it did not pass the correct owner "" when calling Teardown.
  • ClusterMachineStatusController not handling the cases where the matching Machine might be missing.
  • MachineSetController not handling the cases where the Machine resource was missing for any of the MachineSetNodes that were tearing down.

The latter two fixes prevent clusters from getting stuck if they are in a state where there is a missing machine in their machine sets.

Closes #100, closes #97.

Copy link
Member

@smira smira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand how we got into a state that ClusterMachine exists, while Mahcine doesn't - it should not be ever possible...

@utkuozdemir
Copy link
Member Author

I still don't understand how we got into a state that ClusterMachine exists, while Mahcine doesn't - it should not be ever possible...

Yes, that's really confusing, but it happened and we need to recover from it atm.
I have couple of ideas to reproduce that, will give them a try. Probably another bug.

This PR contains fixes for the following issues:
- When a `Link` resource was deleted, `MachineCleanupController` was unable to remove finalizers on the `MachineSetNode` resource, as it did not pass the correct owner `""` when calling `Teardown`.
- `ClusterMachineStatusController` not handling the cases where the matching `Machine` might be missing.
- `MachineSetController` not handling the cases where the `Machine` resource was missing for any of the `MachineSetNode`s that were tearing down.

The latter two fixes prevent clusters from getting stuck if they are in a state where there is a missing machine in their machine sets.

Closes #100, closes #97.

Signed-off-by: Utku Ozdemir <[email protected]>
@utkuozdemir utkuozdemir force-pushed the fix-stuck-clustermachine-deletion branch from 752f03b to 5dc2eaa Compare April 3, 2024 10:46
@utkuozdemir
Copy link
Member Author

/m

@talos-bot talos-bot merged commit 5dc2eaa into main Apr 3, 2024
18 checks passed
@talos-bot talos-bot deleted the fix-stuck-clustermachine-deletion branch April 3, 2024 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 cluster got stuck destroying [feature] Support force-deleting a cluster
3 participants