-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: drain and volume detachment status conditions #1876
base: main
Are you sure you want to change the base?
Conversation
4bb4d97
to
fb3ac47
Compare
Pull Request Test Coverage Report for Build 12982696607Details
💛 - Coveralls |
/assign @engedaam |
This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity. |
/remove-lifecycle stale |
b527992
to
21176e1
Compare
/hold |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: engedaam, jmdeal The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
d536a96
to
43949ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
if err != nil { | ||
return reconcile.Result{}, fmt.Errorf("listing nodeclaims, %w", err) | ||
if nodeutils.IsDuplicateNodeClaimError(err) || nodeutils.IsNodeClaimNotFoundError(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should we throw a comment over this one that indicates that we don't expect this case to happen and (if it does) then we expect that something has gone wrong and we have broken some tenant of the system?
} | ||
|
||
if err = c.deleteAllNodeClaims(ctx, nodeClaims...); err != nil { | ||
return reconcile.Result{}, fmt.Errorf("deleting nodeclaims, %w", err) | ||
// If the underlying NodeClaim no longer exists, we want to delete to avoid trying to gracefully drain nodes that are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remind me again: I recall there was a bug and a reason that we moved this up -- something with us getting stuck on the terminationGracePeriod and continually trying to drain even if the instance was already terminated, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we were only checking this in the drain logic, if we drained but were stuck awaiting volume attachments we never hit this check and could get stuck indefinitely. I don't think there was any interaction with terminationGracePeriod
, if anything it would save users in that case.
if err = c.terminator.Taint(ctx, node, v1.DisruptedNoScheduleTaint); err != nil { | ||
if errors.IsConflict(err) { | ||
if errors.IsConflict(err) || errors.IsNotFound(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the node is no longer found, why why would we choose to requeue in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this came out of a previous review, this deduplicated the Node not found logic by just relying on the check at the top, at the cost of an extra reconcile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right -- it just seems odd because it's completely counter to every other controller where we perform this logic -- seems odd to not handle it because the error check is the same anyways and the check is free
if cloudprovider.IsNodeClaimNotFoundError(err) { | ||
return reconcile.Result{}, c.removeFinalizer(ctx, node) | ||
stored := nodeClaim.DeepCopy() | ||
if modified := nodeClaim.StatusConditions().SetFalse(v1.ConditionTypeDrained, "Draining", "Draining"); modified { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be Drained (Unknown) here since we are in the process of draining but we haven't completed our drain logic -- at which point we would mark the status as Drained=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought is at this point we know it is not drained, since it's in the process of draining. Whereas before we do the check we don't know if there are any drainable pods on the node, so drained is unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: #1876 (comment) but I personally disagree with this framing -- personally, I think we should have done InstanceTerminated and then gone from Unknown to True/False as well there -- transitioning from True -> False or False -> True for a terminal status condition in general is a little odd because it suggests that the process has finished when in fact it hasn't
} else if !c.hasTerminationGracePeriodElapsed(nodeTerminationTime) { | ||
c.recorder.Publish(terminatorevents.NodeAwaitingVolumeDetachmentEvent(node)) | ||
stored := nodeClaim.DeepCopy() | ||
if modified := nodeClaim.StatusConditions().SetFalse(v1.ConditionTypeVolumesDetached, "AwaitingVolumeDetachment", "AwaitingVolumeDetachment"); modified { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this also be setting to Unknown since we are in the process of Detaching the volumes so it hasn't hit a terminal state yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to Drained
, I think it's more clear to set it to False
here since we know the volumes aren't detached. Unknown indicates to me that we don't know one way or the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is counter to how we have been treating status conditions throughout the project -- we are indicating with status conditions that a process hasn't completed and we don't know whether it's going to succeed or fail (for instance, the Completed status condition for a job doesn't go into a False state while the job is running, it stays in Unknown because we don't know if the job is going to complete or not and then transitions to True/False based on whether it entered a terminal state or not
return reconcile.Result{RequeueAfter: 1 * time.Second}, nil | ||
} else { | ||
stored := nodeClaim.DeepCopy() | ||
if modified := nodeClaim.StatusConditions().SetFalse(v1.ConditionTypeVolumesDetached, "TerminationGracePeriodElapsed", "TerminationGracePeriodElapsed"); modified { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see us setting it to False here since this indicates that we failed to attach the volumes and we had to terminate due to hitting our terminationGracePeriod on the node
// 404 = the nodeClaim no longer exists | ||
if errors.IsNotFound(err) { | ||
continue | ||
if volumesDetached { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to cleanup some of this logic so that it's not so nested -- comments might also help too -- the fact that we fall-through when volumes are detached and are able to get to the bottom of the function (we don't requeue) is a bit confusing IMO
InvolvedObject: node, | ||
Type: corev1.EventTypeNormal, | ||
Reason: "AwaitingVolumeDetachment", | ||
Message: "Awaiting deletion VolumeAttachments bound to node", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it useful to list out the volume detachments that we are waiting on (or maybe a pretty list of them) in this list that we have here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't for a similar reason that I didn't include the pods on the Drained
status condition - we would either be hammering the API server as the list changes or the information would be out of date frequently. The former is a non-starter IMO, and the latter makes me feel like it isn't worth it. I think we can add a troubleshooting entry in the docs for how to find blocking volume attachments if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hammering the API server as the list changes or the information would be out of date frequently
You can make it so that the events are only fired at a certain frequency and they are deduped without considering their message -- honestly, I think including the extra info could be helpful and, if there are ones that are actually stuck, would be really valuable information for a user to know
43949ef
to
4ed938c
Compare
New changes are detected. LGTM label has been removed. |
Fixes #N/A
Description
Adds status conditions for node drain and volume detachment to improve observability for the individual termination stages. This is a scoped down version of #1837, which takes these changes along with splitting each termination stage into a separate controller. I will continue to work on that refactor, but I'm decoupling to work on higher priority work.
Status Conditions:
Drained
do-not-disrupt
, etc.). Karpenter will not proceed to instance termination whenDrained
is in this state.VolumesDetached
Drained
transitions to true.How was this change tested?
make presubmit
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.