Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Netty's CONNECT_TIMEOUT_MILLIS client channel option #715

Merged
merged 7 commits into from
Apr 5, 2024

Conversation

DerGuteMoritz
Copy link
Collaborator

Netty has a default connect timeout for client channels (currently 30s) which was only accessible by means of bootstrap-transform. Since it's a rather important option to get predictable connection behavior, expose it as connect-timeout in all relevant APIs.

This patch came about while working on connection establishment cancellation in the context of #714. Turns out that cancelling a connect promise currently has no effect in Netty. Instead, the connection would stick around for roughly 30s. Now I was using the HTTP client for this particular test whose connection-timeout defaults to 60s so this didn't line up. Grepping Netty's source code for "30000" quickly lead me to the DEFAULT_CONNECT_TIMEOUT constant. This of course means that the HTTP client's default of 60s is practically ineffective since Netty's default connect timeout would always fire first! The test case I include in the patch also demonstrates this.

To address this, I thought about changing the default of connection-timeout to aleph.netty/default-connect-timeout, too, but then you could sometimes end up with Netty's ConnectTimeoutException and sometimes with Aleph's ConnectionTimeoutException depending on who wins the race. Another option could be to enforce that connection-timeout < connect-timeout but that could of course still be racey if the difference is very low. Yet another option would be to wait for HTTP client request cancellation to be ready. Then we could set the default pool's connect-timeout to 0 (= infinite) and handle it completely on our end via connection-timeout (which would then cancel the connection attempt). Thoughts?

@KingMob
Copy link
Collaborator

KingMob commented Feb 15, 2024

I thought for a hot minute the two exceptions would share a useful ancestor class, but nope.

Maybe we should handle both exception classes in out catches, and set the netty timeout value from the aleph value.

src/aleph/http.clj Outdated Show resolved Hide resolved
src/aleph/netty.clj Outdated Show resolved Hide resolved
@DerGuteMoritz
Copy link
Collaborator Author

I thought for a hot minute the two exceptions would share a useful ancestor class, but nope.

🙈

Maybe we should handle both exception classes in out catches, and set the netty timeout value from the aleph value.

Good point, that makes users' lives easier and neatly solves the problem that they would sometimes get one or the other when the two timeouts are close to each other 👍 Also went ahead and changed the default :connection-timeout from 60s to Netty's default, see b000d98.

BTW notice how we had (-> ... (d/catch TimeoutException ...) (d/catch ...)) before. Since the first catch returned a d/error-deferred again, the second catch would also always be triggered, resulting in duplicate logging of timeout exceptions.

While at it: if you agree with my rationale here, how about we turn the log/error calls into log/trace or completely drop them?

@KingMob
Copy link
Collaborator

KingMob commented Feb 19, 2024

While at it: if you agree with my rationale #714 (comment), how about we turn the log/error calls into log/trace or completely drop them?

There's no universal way to please everyone with logging. But I tend to view it as a signal to the reader as to what to do. TRACE is likely to be invisible by default. ERROR indicates "do something", and is maybe too strong for something partially outside the operator's control, like timeouts. INFO/DEBUG don't necessarily indicate a problem at all. Perhaps WARNING is the best.

@DerGuteMoritz
Copy link
Collaborator Author

DerGuteMoritz commented Feb 19, 2024

There's no universal way to please everyone with logging. But I tend to view it as a signal to the reader as to what to do. TRACE is likely to be invisible by default. ERROR indicates "do something", and is maybe too strong for something partially outside the operator's control, like timeouts. INFO/DEBUG don't necessarily indicate a problem at all. Perhaps WARNING is the best.

Generally agreed, but in this particular case, I find the logging to be redundant because we communicate the exact same thing via the exception in the deferred. So given that, I still vote to either drop the log here completely or demote it to trace for debugging purposes -- though it would still be redundant since the stacktrace already points to the exception's origin anyway. Convinced? 😄

Note that the situation here is different from the TCP client where the only thing we can currently do on any kind of error is to close the stream -- there, the default error logging is definitely necessary to convey this information to the user at least somehow. (FWIW, a proper API for communicating error causes to the user is on my list of future improvements for the TCP client).

Netty has a default connect timeout for client channels (currently 30s) which was only accessible by
means of `bootstrap-transform`. Since it's a rather important option to get predictable connection
behavior, expose it as `connect-timeout` in all relevant APIs.
This should make callers' lives a little easier...
We already do the same for the default shutdown timeout and it's probably not gonna change a lot
anyway ;-)
…efault-connect-timeout, too

Not very useful in general but at least makes more sense in combination with the pool's default
:connect-timeout of the same duration.
... and improve docstrings of it and default-connect-timeout some.
@DerGuteMoritz DerGuteMoritz force-pushed the expose-netty-connect-timeout branch from 282c219 to 8c547c0 Compare April 5, 2024 13:26
Logging these as error by default is redundant with the excpetion itself which we propagate through
the error-deferred.
@DerGuteMoritz DerGuteMoritz merged commit 9ef3fb6 into master Apr 5, 2024
1 check passed
@DerGuteMoritz DerGuteMoritz deleted the expose-netty-connect-timeout branch April 5, 2024 13:46
@DerGuteMoritz
Copy link
Collaborator Author

FTR: I went ahead and demoted the error logs as suggested above now.

@arnaudgeiser
Copy link
Collaborator

It makes sense, thank you @DerGuteMoritz!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants