-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce new config to randomly pick the xDS server host #972
Conversation
Instead of using the default "pick_first" policy, which always picks the first IP address returned by DNS, "round_robin" will pick a random starting point in the list of IPs. This will fix the very lopsided load seen on the xDS servers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please share local testing results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add your ingraph link showing how skewed this problem is right now? ie., why it deserve a fix right now instead of waiting for a more generic solution?
please add background information on the root cause of the issue, i.e., why we need client side round-robin given that DNS return a random shuffle each time?
Here's the actual implementation of Round robin load balancer that does seem to use a random for the first index: https://github.com/grpc/grpc-java/blob/3e8e56feea2b675c4238ce5e5cab93dc5c601fd5/util/src/main/java/io/grpc/util/RoundRobinLoadBalancer.java#L47 Here's a relevant test: https://github.com/grpc/grpc-java/blob/3e8e56feea2b675c4238ce5e5cab93dc5c601fd5/util/src/test/java/io/grpc/util/RoundRobinLoadBalancerTest.java#L383 |
@logstashbugreporter updated the PR description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
Only that I couldn't think of a good way to test locally, since the client can't get info about the sub-channel picked in the grpc/xds channel.
Should be pretty simple to test locally. The observer host a client connects to is in the |
@PapaCharlie suggest to test locally and update results. |
Yeah that's what I was doing today, will post the results soon
…On Fri, Jan 26, 2024 at 16:06 Shivam Gupta ***@***.***> wrote:
@PapaCharlie <https://github.com/PapaCharlie> suggest to test locally and
update results.
—
Reply to this email directly, view it on GitHub
<#972 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHDA2POKBNFP6EV6CGR2BLYQRAKHAVCNFSM6AAAAABCHT7U76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJSHA2TCNRWGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Tested this locally by adding the following to my
Without the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for testing it locally too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM,
just to make sure I understand this, without setting round-robin as default LB policy, NettyChannelBuilder will use the first one of the sorted list of addresses?
@logstashbugreporter yeah exactly. The sorting is based on IPv6 proximity, so it picks whichever observer it thinks is closest. The problem is that some observers seem to be "closer" than most, which is why we see this very skewed distribution. The round_robin policy ignores proximity and picks a random IP from the set of returned IPs |
Instead of using the default "pick_first" policy, which always picks the first IP address returned by DNS, "round_robin" will pick a random starting point in the list of IPs. This will fix the very lopsided load seen on the xDS servers.
Here is the current traffic distribution with using
pick_first
: