You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a specific "race" condition in which the system-agent unnecessarily will restart and/or reapply a plan.
The condition that can cause this is from the following:
CAPR planner delivers a new plan to the system-agent
system-agent takes plan, applies it, updates the secret with the applied-checksum
CAPR plansecret controller sees updated plan secret, and proceeds to update the appliedPlan on the secret
system-agent in the mean time has re-enqueued and is trying to run the probes for the second iteration -- when it is done and tries to update but by this time, the CAPR plansecret controller has beat it and the system-agent gets a conflict error due to mismatched RV.
The proposed fix for this is to simply attempt to retrieve the latest secret from the api server, ensure the applied checksum still matches the plan that was just applied, and if so, update the latest secret. This is a safe operation because contractually, the system-agent and CAPR have a contract that makes each responsible for their specific keys.
The text was updated successfully, but these errors were encountered:
There is a specific "race" condition in which the system-agent unnecessarily will restart and/or reapply a plan.
The condition that can cause this is from the following:
planner
delivers a new plan to the system-agentsystem-agent
takes plan, applies it, updates the secret with theapplied-checksum
plansecret
controller sees updated plan secret, and proceeds to update theappliedPlan
on the secretplansecret
controller has beat it and thesystem-agent
gets aconflict
error due to mismatched RV.The proposed fix for this is to simply attempt to retrieve the latest secret from the api server, ensure the applied checksum still matches the plan that was just applied, and if so, update the latest secret. This is a safe operation because contractually, the
system-agent
and CAPR have a contract that makes each responsible for their specific keys.The text was updated successfully, but these errors were encountered: