Expose Netty's CONNECT_TIMEOUT_MILLIS client channel option #715

DerGuteMoritz · 2024-02-14T16:04:42Z

Netty has a default connect timeout for client channels (currently 30s) which was only accessible by means of bootstrap-transform. Since it's a rather important option to get predictable connection behavior, expose it as connect-timeout in all relevant APIs.

This patch came about while working on connection establishment cancellation in the context of #714. Turns out that cancelling a connect promise currently has no effect in Netty. Instead, the connection would stick around for roughly 30s. Now I was using the HTTP client for this particular test whose connection-timeout defaults to 60s so this didn't line up. Grepping Netty's source code for "30000" quickly lead me to the DEFAULT_CONNECT_TIMEOUT constant. This of course means that the HTTP client's default of 60s is practically ineffective since Netty's default connect timeout would always fire first! The test case I include in the patch also demonstrates this.

To address this, I thought about changing the default of connection-timeout to aleph.netty/default-connect-timeout, too, but then you could sometimes end up with Netty's ConnectTimeoutException and sometimes with Aleph's ConnectionTimeoutException depending on who wins the race. Another option could be to enforce that connection-timeout < connect-timeout but that could of course still be racey if the difference is very low. Yet another option would be to wait for HTTP client request cancellation to be ready. Then we could set the default pool's connect-timeout to 0 (= infinite) and handle it completely on our end via connection-timeout (which would then cancel the connection attempt). Thoughts?

KingMob · 2024-02-15T10:22:02Z

I thought for a hot minute the two exceptions would share a useful ancestor class, but nope.

Maybe we should handle both exception classes in out catches, and set the netty timeout value from the aleph value.

src/aleph/http.clj

src/aleph/netty.clj

DerGuteMoritz · 2024-02-16T16:19:54Z

I thought for a hot minute the two exceptions would share a useful ancestor class, but nope.

🙈

Maybe we should handle both exception classes in out catches, and set the netty timeout value from the aleph value.

Good point, that makes users' lives easier and neatly solves the problem that they would sometimes get one or the other when the two timeouts are close to each other 👍 Also went ahead and changed the default :connection-timeout from 60s to Netty's default, see b000d98.

BTW notice how we had (-> ... (d/catch TimeoutException ...) (d/catch ...)) before. Since the first catch returned a d/error-deferred again, the second catch would also always be triggered, resulting in duplicate logging of timeout exceptions.

While at it: if you agree with my rationale here, how about we turn the log/error calls into log/trace or completely drop them?

KingMob · 2024-02-19T11:15:17Z

While at it: if you agree with my rationale #714 (comment), how about we turn the log/error calls into log/trace or completely drop them?

There's no universal way to please everyone with logging. But I tend to view it as a signal to the reader as to what to do. TRACE is likely to be invisible by default. ERROR indicates "do something", and is maybe too strong for something partially outside the operator's control, like timeouts. INFO/DEBUG don't necessarily indicate a problem at all. Perhaps WARNING is the best.

DerGuteMoritz · 2024-02-19T13:20:12Z

There's no universal way to please everyone with logging. But I tend to view it as a signal to the reader as to what to do. TRACE is likely to be invisible by default. ERROR indicates "do something", and is maybe too strong for something partially outside the operator's control, like timeouts. INFO/DEBUG don't necessarily indicate a problem at all. Perhaps WARNING is the best.

Generally agreed, but in this particular case, I find the logging to be redundant because we communicate the exact same thing via the exception in the deferred. So given that, I still vote to either drop the log here completely or demote it to trace for debugging purposes -- though it would still be redundant since the stacktrace already points to the exception's origin anyway. Convinced? 😄

Note that the situation here is different from the TCP client where the only thing we can currently do on any kind of error is to close the stream -- there, the default error logging is definitely necessary to convey this information to the user at least somehow. (FWIW, a proper API for communicating error causes to the user is on my list of future improvements for the TCP client).

Netty has a default connect timeout for client channels (currently 30s) which was only accessible by means of `bootstrap-transform`. Since it's a rather important option to get predictable connection behavior, expose it as `connect-timeout` in all relevant APIs.

This should make callers' lives a little easier...

We already do the same for the default shutdown timeout and it's probably not gonna change a lot anyway ;-)

…efault-connect-timeout, too Not very useful in general but at least makes more sense in combination with the pool's default :connect-timeout of the same duration.

... and improve docstrings of it and default-connect-timeout some.

Logging these as error by default is redundant with the excpetion itself which we propagate through the error-deferred.

DerGuteMoritz · 2024-04-05T13:46:48Z

FTR: I went ahead and demoted the error logs as suggested above now.

arnaudgeiser · 2024-04-05T14:01:33Z

It makes sense, thank you @DerGuteMoritz!

DerGuteMoritz requested review from arnaudgeiser and KingMob as code owners February 14, 2024 16:04

arnaudgeiser reviewed Feb 15, 2024

View reviewed changes

src/aleph/http.clj Outdated Show resolved Hide resolved

src/aleph/netty.clj Outdated Show resolved Hide resolved

arnaudgeiser approved these changes Feb 19, 2024

View reviewed changes

DerGuteMoritz added 6 commits April 5, 2024 15:14

Wrap Netty's ConnectTimeoutException in our ConnectionTimeoutException

ebed7ae

This should make callers' lives a little easier...

Copy Netty's DEFAULT_CONNECT_TIMEOUT

41f1b07

We already do the same for the default shutdown timeout and it's probably not gonna change a lot anyway ;-)

Set aleph.http/request's default :connection-timeout to aleph.netty/d…

5853c46

…efault-connect-timeout, too Not very useful in general but at least makes more sense in combination with the pool's default :connect-timeout of the same duration.

Make aleph.netty/default-shutdown-timeout officially public

b0a74ca

... and improve docstrings of it and default-connect-timeout some.

Correct connection-timeout docstring in def-http-method, too

8c547c0

DerGuteMoritz force-pushed the expose-netty-connect-timeout branch from 282c219 to 8c547c0 Compare April 5, 2024 13:26

Demote connection establishment failure logging

262d3da

Logging these as error by default is redundant with the excpetion itself which we propagate through the error-deferred.

DerGuteMoritz merged commit 9ef3fb6 into master Apr 5, 2024
1 check passed

DerGuteMoritz deleted the expose-netty-connect-timeout branch April 5, 2024 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose Netty's CONNECT_TIMEOUT_MILLIS client channel option #715

Expose Netty's CONNECT_TIMEOUT_MILLIS client channel option #715

DerGuteMoritz commented Feb 14, 2024

KingMob commented Feb 15, 2024

DerGuteMoritz commented Feb 16, 2024

KingMob commented Feb 19, 2024

DerGuteMoritz commented Feb 19, 2024 •

edited

Loading

DerGuteMoritz commented Apr 5, 2024

arnaudgeiser commented Apr 5, 2024

Expose Netty's CONNECT_TIMEOUT_MILLIS client channel option #715

Expose Netty's CONNECT_TIMEOUT_MILLIS client channel option #715

Conversation

DerGuteMoritz commented Feb 14, 2024

KingMob commented Feb 15, 2024

DerGuteMoritz commented Feb 16, 2024

KingMob commented Feb 19, 2024

DerGuteMoritz commented Feb 19, 2024 • edited Loading

DerGuteMoritz commented Apr 5, 2024

arnaudgeiser commented Apr 5, 2024

DerGuteMoritz commented Feb 19, 2024 •

edited

Loading