-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check failed to run: execution expired #82
Comments
@dhdanno I will try to re-produce locally |
So far I have not been able to reproduce locally using ruby 2.3 or ruby 2.4 I am using this command:
I have not been able to get it to takes even 0.4 seconds, is your endpoint publicly that I can test against? Can you replicate this behavior outside of the context of sensu? |
The checks we have run every 60 seconds but this issue only occurs once perhaps every few days or less... The system is running 30 of these checks to remote endpoints. I'll see if i can replicate |
Could it be something endpoint specific? Maybe try monitoring that endpoint and see if you can replicate. I have tried several thousand times now and have been unable to replicate anything that indicates an issue outside of the context of sensu. I need to upgrade my work env to use newer versions of this gem (I just looked and I am pinned on 0.2.1 so it's quite old) to see if I can replicate in the context of sensu. |
I have simulated a slow nginx server with
And this successfully triggers the timeout... but i'm not seeing the error... the correct "CheckHttp CRITICAL: Request timed out" is returned. I'm not sure if the endpoint being slow to respond is the real issue... It might be worth parallelizing this test into thousands of requests to see if any come back with the error. |
hmm, unfortunately it's probably gonna be pretty hard to troubleshoot this. When I have some time I can try setting up a load test against some endpoint and see if I can replicate. |
I've been able to reproduce this, running the latest version of the check standalone against any non-responsive endpoint. The funny thing is, when i increase the timeout it seems to return properly. Regular:
200 second timeout:
|
Hmm that is indeed odd, can you re-produce this without using the embedded sensu ruby? I tried this locally and got what I would have expected:
|
I installed bundler on a fresh system, and after managing to resolve dependencies, It seemed to work fine... so i guess it must be something with the ruby installation on the system |
Interesting... |
This error could occur if DNS resolution fails. Ruby Timeout.timeout() expires and obscures the reason for the failure. With default resolver config on Linux (timeout 5s, 2 attempts) and the default max_retries of 1 in Net::HTTP (Not configurable in ruby <2.5.0) you end up making 4 requests, so you need a timeout setting of >20seconds to get a name resolution error. Is the Timeout.timeout() call at https://github.com/sensu-plugins/sensu-plugins-http/blob/master/bin/check-http.rb#L254 needed? All the Net::HTTP timeouts are being explicitly set, I think that all the "correct" Request timed out errors are coming from that. |
Even without using Timeout.timeout(), a slow DNS failure will be caught by Net::HTTP's open_timeout. I'm going to raise a PR to better capture the Net::HTTP errors to at least give a clue as to where the issue may be. But eventually it's down to the user's DNS timeout settings if name resolution failures will be visible in the check output. (options timeout:1 in resolv.conf) |
I responded to the PR, I don't think we need it but removing it is a breaking change as it acts as a high level circuit breaker and supersedes the HTTP timeouts which are all currently configured to be the same value. There are some paths forward but all roads I can think of lead to a breaking change. See the linked PR for more details. |
We still see this issue occasionally even after upgrading to sensu 1.0, sensu-plugin-1.4.5 with http-2.5.0
This check took 15 seconds to run. Expected to have taken less than 1
command: "check-http.rb -u https://domain/path --response-code 302 -r"
The text was updated successfully, but these errors were encountered: