-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make podRef to be unique #433
Conversation
Sometimes, we want to run the reconciler outside of the cluster for development convenience. This patch allows the reconciler process to use the KUBECONFIG environment variable. Signed-off-by: Peng Liu <[email protected]>
fix #176 |
Why have you chosen that format - If you just need the pod ref to be unique, shouldn't we simply use the pod's UID ? Is the name part meant to help debugging or something like that ? |
The new naming format shall be namespace/pod_name:pod_uid. And yes, it's meant for debugging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I'd love to have another review on it before a merge, but overall this approach makes sense. Thank you Peng!
cfg, err = clientcmd.BuildConfigFromFlags("", kubeConfigFile) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to generate kubeconfig from file %s: %v", kubeConfigFile, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the improved logging on error
} | ||
|
||
// migrate the podRef format if needed | ||
if err := looper.migrationPodRef(ctx, ipPools); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I get it how it triggers the migration, so here in the control loop, we check if the migration has happened and apply it here.
pkg/reconciler/iploop.go
Outdated
k8sClient: *k8sClient, | ||
liveWhereaboutsPods: indexPods(pods, whereaboutsPodRefs), | ||
requestTimeout: timeout, | ||
pods, err := k8sClient.ListPods(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we limit this to just pods with */networks
annotations maybe? Unsure if this is premature optimization?
Something like:
// Define label selector to filter pods with specific annotations
labelSelector := metav1.LabelSelector{
MatchExpressions: []metav1.LabelSelectorRequirement{
{
Key: "k8s.v1.cni.cncf.io/networks",
Operator: metav1.LabelSelectorOpExists,
},
},
}
// List pods matching the label selector
pods, err := k8sClient.CoreV1().Pods("").List(ctx, metav1.ListOptions{
LabelSelector: metav1.FormatLabelSelector(&labelSelector),
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I had this wrong, this is abstracted. It's fine as is. Thanks!
pkg/reconciler/iploop_test.go
Outdated
fakewbclient "github.com/k8snetworkplumbingwg/whereabouts/pkg/client/clientset/versioned/fake" | ||
) | ||
|
||
var _ = FDescribe("MigrationPodRef", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the migration unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to remove the focus from these tests, otherwise, you'll highjack the CI to only run these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I forgot it.
9fa1805
to
037d68b
Compare
/hold |
3a84167
to
84bcac4
Compare
5e0d6ab
to
b823fee
Compare
We used podRef with format <namespace>/<pod_name> to identify the IP allocation. However, in scenarios like replicaSet, a different pod can reuse the pod name. This patch changes the format of podRef to <namespace>/<pod_name>:<pod_uid> to ensure each ip allocation will have a unique identifier. The legacy podRef format will be updated automatically during an upgrade. Signed-off-by: Peng Liu <[email protected]>
The containerID of a pod will change after node reboot. It caused the IP allocation/deallocation operation cannot find the IP that has been reserved by the pod. Signed-off-by: Peng Liu <[email protected]>
if err != nil { | ||
return nil, fmt.Errorf("failed to implicitly generate the kubeconfig: %w", err) | ||
logging.Debugf("failed to generate the kubeconfig from service account: %v", err) | ||
kubeConfigFile := os.Getenv("KUBECONFIG") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both newPodController
and ReconcileIPs
are exclusively called from controlloop.go:main()
.
I know that the operation doesn't cost anything, but it rubs me the wrong way to do kubeConfigFile := os.Getenv("KUBECONFIG")
in each of them, the correct place imo is in controlloop.go:main()
and then pass kubeConfigFile
as an argument to both newPodController
and ReconcileIPs
. Perhaps that allows even for more optimizations
cfg, err := rest.InClusterConfig() | ||
var cfg *rest.Config | ||
var err error | ||
cfg, err = rest.InClusterConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is run inside the cluster, but the KUBECONFIG
env variable is set:
- ip.go will use
NewReconcileLooperWithKubeconfig
due to the if condition newPodController
will userest.InClusterConfig
because in clusterrest.InClusterConfig
will succeed
So your code can end up using the in cluster config for one, and the kubeconfig for the other
@@ -39,6 +39,7 @@ const ( | |||
cronSchedulerCreationError | |||
fileWatcherError | |||
couldNotCreateConfigWatcherError | |||
initialReconsileFailed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: initialReconsileFailed -> initialReconcileFailed
// trigger one immediate reconcile before cron job start | ||
go reconciler.ReconcileIPs(errorChan) | ||
err = <-errorChan | ||
if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
if err = <-errorChan; err != nil {
logging.Verbosef("initial reconciler failure: %s", err)
os.Exit(initialReconcileFailed)
}
logging.Verbosef("initial reconciler success")
return logging.Errorf("failed to list overlappingrangeipreservations %v", err) | ||
} | ||
for _, ip := range reservations { | ||
if regexp.MustCompile(pattern).MatchString(ip.Spec.PodRef) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - the regexp doesn't have to be compiled on every loop iteration:
const (
podRefPattern = `^([^\s/]+)/([^\s/]+):([^\s/]+)$`
)
(...)
func (rl ReconcileLooper) migrationPodRef(ctx context.Context, ipPools []storage.IPPool) error {
re := regexp.MustCompile(podRefPattern)
(...)
for _, ip := range reservations {
if re.MatchString(ip.Spec.PodRef) {
ipReservations = nil | ||
needUpdate := false | ||
for _, r := range pool.Allocations() { | ||
if !regexp.MustCompile(pattern).MatchString(r.PodRef) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dito
Expect(err).NotTo(HaveOccurred()) | ||
|
||
found := false | ||
pattern := `^([^\s/]+)/([^\s/]+):([^\s/]+)$` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: reuse podRefPattern
from iploop.go
once that const is created, instead of redefining the pattern (less error prone):
const (
podRefPattern = `^([^\s/]+)/([^\s/]+):([^\s/]+)$`
)
Expect(err).NotTo(HaveOccurred()) | ||
for _, pool := range ipPools { | ||
for _, r := range pool.Allocations() { | ||
if !regexp.MustCompile(pattern).MatchString(r.PodRef) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: dito as earlier, compile outside of the loop, instead of compiling every single time
ips, err := reconcileLooper.k8sClient.ListOverlappingIPs(ctx) | ||
Expect(err).NotTo(HaveOccurred()) | ||
for _, ip := range ips { | ||
if !regexp.MustCompile(pattern).MatchString(ip.Spec.PodRef) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dito
return nil, logging.Errorf("Error marshaling overlappingrangeipreservations.whereabouts.cni.cncf.io spec: %v", err) | ||
} | ||
|
||
return i.client.WhereaboutsV1alpha1().OverlappingRangeIPReservations(clusterWideIP.GetNamespace()).Patch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: does this have to be Patch, don't we have an Update() for that? (I might be completely wrong)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regardless, isn't https://pkg.go.dev/k8s.io/client-go/util/retry#RetryOnConflict needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IP reconciler runs on multiple nodes, there will be racing conditions if using Update() in this case. However, using Patch() could cause too many API requests. We may need a leader election here.
I agree that we need to add a retry mechanism. This is a general problem in the IP reconciler, it doesn’t retry when reconciliation fails.
I've got a question ... will this actually work with Statefulsets and guarantee that the same IP address is always assigned to the same Statefulset pod? Because the uid between Statefulset pods changes:
I'm trying to understand when and why it would ever be a problem that an allocation is being reused (in the case of Statefulsets, isn't it even desired that an allocation is reused?) when the combination of .. well, just looking at this right now in an SNO cluster that I have, the IP address of my single statefulset pod is actually changed after reboot .. To clarify my point:
The commit's comment is misleading, because the identifier In kubernetes, there will never ever be 2 pods with the same |
@andreaskaris I think you made a great point. Instead of addressing it from the IP reconciler's perspective, resolving it at the allocation stage would be more effective. |
replaced by #446 |
We used podRef with format <namespace>/<pod_name> to identify the IP allocation. However, in scenarios like replicaSet, a different pod can reuse the pod name. This patch changes the format of podRef to <namespace>/<pod_name>:<pod_uid> to ensure each ip allocation will have a unique identifier. The legacy podRef format will be updated automatically during an upgrade.