-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lighthouse agent is slow to process current EndpointSlices update events from broker during resync after restart #1706
Comments
What version of Submariner are you using? |
0.17.3 |
This looks similar to #1623 which eliminated the periodic resync but all EndpointSlices are still processed on startup and they're all added to the queue quickly which kicks in the rate limiter. |
Is it possible to prioritize processing current events over resynced events? From logs I see many of the resynced events are NOOPs, only few of these resynced events are actually missed events that require processing. Whereas current events should be processed as soon as possible; why do they get processed after all resynced events are done processing? One possible idea is to do partitioned rate limiting, one partition for ongoing events and one for resync events. That way there is always dedicated capacity for processing current events. |
We don't have control over the event processing ordering - that's handled by the K8s informer. On startup, it lists all current resources and notifies the event handler before starting the watch for new events. We could just remove the rate limiting altogether, at least the |
Thank you, we can try adjusting the rate limiting settings on our end. Also do you know if there is ongoing work to make submariner-lighthouse-agent deployment have multiple replicas with leader election? Or is this something we need to try ourselves? |
The current developers have no plans for that. |
Sorry, one more question. Why is there only one goroutine processing events from queue? Is there some concurrency concern for using multiple goroutines to process endpointslices in parallel? The bucket rate limiter is configured with 500 tokens and refills at 10 tokens per second. So rate limiter should only delay 1500 events by 100 seconds. I think the 10 minutes latency we are seeing is mostly from slow processing of events, and not due to rate limiting. Maybe processing events in parallel should improve latency, we can explore in our setup. |
The actual processing of the events in our code on sync from the broker is negligible (probably on the order of a few ms at most). The wildcard is in accessing the API server after transformation to update the local resource. If there is a significant total latency processing a single event, I'm sure the vast majority is accessing the API server and that's out of our hands, in which case I don't think parallelizing the processing is really going to help much with throughput. In fact it may make things worse by further overloading the API server. Then again, who knows, you can certainly experiment with it. I still suggest removing the bucket rate limiter and see if that helps enough. |
@tpantelis There could be some value in providing an easier user configurable way to tweak rate limiters, retry timers etc. As different users come up with more diverse deployments, one-size-fit-all approach will not work well. Maybe a configmap to tweak such params? |
Perhaps but I think we should just remove the bucket rate limiter. I think we get enough overall throttling via the single worker. |
So looking at this in more detail, with 1500 EPS's taking 10 min total to process (minus the 1+ min for the rate limiter), it takes about 300-400 ms per EPS on average. For the broker syncing, the processing once de-queued mainly entails:
4 has an artificial wait, up to 100 ms worst case, for the synced object to get into the local cache. This should only come into play for newly created EPS plus is only done for ClusterIP services and it sounds like yours are headless. The rest are (or should be) negligible except for 3b and 3c which involve requests to the API server. 3c shouldn't occur if nothing in the EPS was actually changed. So it seems reading the existing resource is taking most of the time, unless I'm missing something. I don't know if ~300 ms latency is normal but parallelizing would probably help at least somewhat. |
It is possible to hook in a backend priority queue to the client-go rate-limiting work queue using the |
submariner-io/admiral#1052 adds priority queueing. This will quicken the processing of resources that are updated/created after the initial queueing of existing resources on restart. Note that this won't help with a resource that was updated while the agent pod was down - it will get the same priority as all the other pre-existing resources that didn't change during that time so it will be arbitrarily delayed. |
What happened:
We have a fleet of 40 clusters with service discovery enabled using Submariner. We rely on Submariner to sync endpointslices across the fleet. We currently have over 1500 endpointslices in our broker cluster.
We have noticed an issue where a cluster in our fleet can stop syncing endpointslices from the broker for up to 10 minutes. This happens when submariner-lighthouse-agent pod restarts on the cluster. We have seen pod restarts due to preemption by Kubernetes scheduler and due to nodes getting tainted and drained for maintenance. We fixed the former by increasing submariner-lighthouse-agent scheduling priority. But node draining still affects submariner-lighthouse-agent availability.
When the agent pod restarts, we see the agent resync all endpointslices from broker to cluster (the processing seems to be done in endpointslice name/namespace alphabetical order). Even if endpointslices are updated on the broker during this resync, these update events do not get processed until the processing loop reaches the endpointslices in its alphabetical order. Due to the volume of endpointslices in our broker (> 1500), we see that endpointslices that are in the end of resync list get processed only after 10 mins.
What you expected to happen:
Is it possible to improve the availability of submariner-lighthouse-agent and make it able to tolerate random ocassional pod restarts? One solution is to increase agent replicas and add leader election to help keep agent keep running even when one of the pods goes down. We can discuss better alternative solutions.
How to reproduce it (as minimally and precisely as possible):
Create 2 clusters and join them to broker.
Create 1500 headless service/service exports in a cluster A. This should create 1500 endpointslices in broker.
Restart lighthouse agent in cluster B.
Restart a deployment for headless service in cluster A to trigger pod IP change in endpointslice.
Observe that endpointslice in broker cluster gets updated immediately. But endpointslice in cluster B will take some time to get latest changes.
Anything else we need to know?:
Environment: Linux
subctl diagnose all
):subctl gather
):The text was updated successfully, but these errors were encountered: