-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Timeout handling and improves obscuring of DNS failure messages being swallowed by the base timeout (circuit breaker) #123
Improve Timeout handling and improves obscuring of DNS failure messages being swallowed by the base timeout (circuit breaker) #123
Conversation
Thanks for your contribution to Sensu plugins! Without people like you submitting PRs we couldn't run the project. I will review it shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good but currently this is a breaking as it removes a public interface which we can add backwards compatibility pretty easily. Also can you please add a changelog?
Thanks for your awesome work.
bin/check-http.rb
Outdated
@@ -251,9 +251,11 @@ def run | |||
end | |||
|
|||
begin | |||
Timeout.timeout(config[:timeout]) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside acquire_resource
below we should use this to still provide a way for people to account for slow overloaded web servers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout options to Net::HTTP are already being set to the timeout value passed as config:
sensu-plugins-http/bin/check-http.rb
Lines 282 to 286 in ca3b5d3
http.read_timeout = config[:timeout] | |
http.open_timeout = config[:timeout] | |
http.ssl_timeout = config[:timeout] | |
http.continue_timeout = config[:timeout] | |
http.keep_alive_timeout = config[:timeout] |
Hi, thanks for looking at this so quickly. I don't think this is a breaking change, the timeout command line option is still being used to set the Net::HTTP timeout options. Updated the changelog as requested |
I need to sit down and look at it again to eval it for breaking changes. Will try to get to it tonight. |
First off I want to thank you for your time and apologies it took me a couple of days to get back to this. It's definitely a breaking change but one I agree that we should make, now the real question is how to get there. In it's current form it would timeout at the |
OK, I agree with your point about the changed behaviour. Keeping Timeout.timeout() will always mask the DNS failures, as these are not caught by the Net::HTTP open_timeout. I tested setting the open timeout to 1 second and confirmed this. So the only way forward I see is to set explicit open and read timeout values as unique CLI options. Let me know what you think and I'll implement it. |
Ya I think we should expose those options, we could either do this as one or more changes
We could use the ruby resolver rather than relying on the OS resolver so we can rescue those. We could expose the DNS server to use so that it could be set to still use the local resolver that may be in place (such as |
5068346
to
9c1dce1
Compare
Hi, That's really interesting that you can use the ruby resolver instead, and set the timeouts via that. I didn't know you could do that. So I've updated the PR accordingly to use that method, and to add the open and read timeout values via CLI options. If the DNS request times out, we can't catch that error directly as it's caught by Net::HTTP. But the resulting error is much more useful for debugging:
I set the DNS timeout to 800ms, so that it's always lower than the lowest value for --timeout. But perhaps this too should be configurable? |
9c1dce1
to
e56904e
Compare
The changed resolver behaviour has broken some tests, I'll look into that |
You can rescue the ruby DNS timeout via https://ruby-doc.org/stdlib-2.4.0/libdoc/resolv/rdoc/Resolv/ResolvTimeout.html additionally you can use https://ruby-doc.org/stdlib-2.4.0/libdoc/resolv/rdoc/Resolv/ResolvError.html as a more generic DNS resolution exception class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better this now no longer is really a breaking change, I do have some additional suggestions/improvements to make. Would you be willing to add a test for a DNs timeout? We should be able to trigger it to setting a ridiculously low timeout (something like 0.0001
seconds) here: https://github.com/sensu-plugins/sensu-plugins-http/blob/master/test/integration/helpers/serverspec/check-http-shared_spec.rb if not I can look at writing it once you have everything else updated.
bin/check-http.rb
Outdated
@@ -250,10 +263,19 @@ def run | |||
config[:port] ||= config[:ssl] ? 443 : 80 | |||
end | |||
|
|||
# Use Ruby DNS Resolver and set DNS resolution timeout to 800ms | |||
dns_resolver = Resolv::DNS.new | |||
dns_resolver.timeouts = 0.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya let's expose that as an option so its configurable and make sure to cast it as a float.
CHANGELOG.md
Outdated
@@ -4,6 +4,9 @@ This project adheres to [Semantic Versioning](http://semver.org/). | |||
This CHANGELOG follows the format listed [here](https://github.com/sensu-plugins/community/blob/master/HOW_WE_CHANGELOG.md) | |||
|
|||
## [Unreleased] | |||
### Changed | |||
- `check-http.rb`: Add options to set open-timeout and read-timeout for Net:HTTP. Improve output on what Net::HTTP timeout was encountered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally I'd like to see the option names being called out here so someone reading this knows exactly what to update to leverage.
For example:
## [Unreleased]
### Added
- `check-http.rb`: Add options to set `--open-timeout` and `--read-timeout` for Net:HTTP. Additionally rescue `Net::OpenTimeout` and `Net::ReadTimeout` exception classes (@mygithubuser)
- `check-http.rb`: exposed `--dns-timeout` and rescue on `Resolv::ResolvTimeout` exception class (@mygithubuser)
### Changed
- `check-http.rb`: switched to using rubies DNS resolver (which looks at `/etc/resolv.conf`) and rescue `Resolv::ResolvError` exception class for generic DNS failures (@mygithubuser)
bin/check-http.rb
Outdated
@@ -250,10 +263,19 @@ def run | |||
config[:port] ||= config[:ssl] ? 443 : 80 | |||
end | |||
|
|||
# Use Ruby DNS Resolver and set DNS resolution timeout to 800ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update comment when you expose it as a var.
@@ -20,3 +20,5 @@ mkmf.log | |||
#Intellij files | |||
*.iml | |||
*.idea | |||
|
|||
vendor/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
It also appears that this breaks DNS resolution according to the tests which is curious because it mostly uses
I will need to take a closer look at why its failing in the tests. |
I was not able to replicate this behavior in a container but I did notice that in the container it returned an ipv6 instead of an ipv4 address:
I don't think this is an ipv4 vs 6 issue but I will need to do some more testing to figure out whats going on. |
Interestingly enough it seems to fail with the same error about localhost not being resolved even when specifying the ipv4 loopback address:
Outside of the code it works as expected:
|
I need to step away to regain some sanity but I am not 100% sure if its caused by the tests being run in docker with a broken ipv6 or if it breaks with all ipv6 setups. I assume its the former, I don't really do anything with ipv6 so I am not sure where to go from here. If you have a hostname that resolves to ipv6 and can validate that it works that would be appreciated and then I can figure out how to disable ipv6 in the container (probably easiest would be to just comment it out like I did while messing around).
|
Here is the current diff of my changes/debug code: $ git diff
diff --git a/bin/check-http.rb b/bin/check-http.rb
index cda472b..5d0fd0b 100755
--- a/bin/check-http.rb
+++ b/bin/check-http.rb
@@ -264,14 +264,26 @@ class CheckHttp < Sensu::Plugin::Check::CLI
end
# Use Ruby DNS Resolver and set DNS resolution timeout to 800ms
+ hosts_resolver = Resolv::Hosts.new
dns_resolver = Resolv::DNS.new
dns_resolver.timeouts = 0.8
- Resolv::DefaultResolver.replace_resolvers([dns_resolver])
+ Resolv::DefaultResolver.replace_resolvers(
+ [
+ hosts_resolver,
+ dns_resolver,
+ ]
+ )
+ resolved_hostname = Resolv.getaddress(config[:host])
+ p resolved_hostname
begin
Timeout.timeout(config[:timeout]) do
- acquire_resource
+ acquire_resource(resolved_hostname)
end
+ rescue Resolv::ResolvTimeout => e
+ critical "Unable to resolve DNS in a timely manner: #{e}"
+ rescue Resolv::ResolvError => e
+ critical "Unable to resolve DNS: #{e}"
rescue Net::OpenTimeout
critical 'Request timed out opening connection'
rescue Net::ReadTimeout
@@ -283,23 +295,26 @@ class CheckHttp < Sensu::Plugin::Check::CLI
end
end
- def acquire_resource
+ def acquire_resource(endpoint)
http = nil
-
+ # if endpoint.start_with?('::')
+ # endpoint = "\'[#{endpoint}]\'"
+ # end
+ p endpoint
if config[:no_proxy]
- http = Net::HTTP.new(config[:host], config[:port], nil, nil)
+ http = Net::HTTP.new(endpoint, config[:port], nil, nil)
elsif config[:proxy_url]
proxy_uri = URI.parse(config[:proxy_url])
if proxy_uri.host.nil?
unknown 'Invalid proxy url specified'
end
http = if proxy_uri.user && proxy_uri.password
- Net::HTTP.new(config[:host], config[:port], proxy_uri.host, proxy_uri.port, proxy_uri.user, proxy_uri.password)
+ Net::HTTP.new(endpoint, config[:port], proxy_uri.host, proxy_uri.port, proxy_uri.user, proxy_uri.password)
else
- Net::HTTP.new(config[:host], config[:port], proxy_uri.host, proxy_uri.port)
+ Net::HTTP.new(endpoint, config[:port], proxy_uri.host, proxy_uri.port)
end
else
- http = Net::HTTP.new(config[:host], config[:port])
+ http = Net::HTTP.new(endpoint, config[:port])
end
http.read_timeout = config[:read_timeout]
http.open_timeout = config[:open_timeout]
diff --git a/test/fixtures/bootstrap.sh b/test/fixtures/bootstrap.sh
index 8816812..048dd01 100755
--- a/test/fixtures/bootstrap.sh
+++ b/test/fixtures/bootstrap.sh
@@ -37,6 +37,7 @@ fi
echo "
server {
listen 80;
+ listen [::]:80 ipv6only=on;
location /okay {
limit_except GET { |
I started working on this but ran out of time, I won't have a chance to pick it up again for a couple of days. But I commited and pushed the current WIP
I've also added a 'Host' header to Net::HTTP requests as otherwise the request may not succeed for a virtual host, given we are now making the HTTP request via IP address. I also had problems with localhost resolving to The test I added to validate DNS timeouts isn't working, the resolution doesn't timeout because the lookup for localhost comes from the host resolver. I tried using google.com and it succeeded too. How can we simulate a DNS timeout in these conditions? I was looking at using tc to simulate a network delay but this requires more config to work successfully within docker containers. |
Probably need to use some kind of unresponsive server for example |
e075be6
to
36dd335
Compare
Hi, I've made some more updates: Updated tests to use I removed the suggested step to manually resolve the hostname and pass the IP address to
So I don't think we can explicitly capture the DNS lookup failures. However, the error returned from the check is still useful and incidicative of the problem, i.e.
The DNS timeout test is working when setting a suitably low resolution timeout. I've also updated the changelog as requested. I think this is the best we can do with this, but perhaps there's a better solution for the localhost issue with the tests. Thanks! |
@johanek awesome I will take a closer look this evening, the other option I was thinking about was updating the bootstrap script to run a |
I'm happy with where this is now, thanks for your feedback and suggestions to improve my original request 👍 |
@johanek thanks for all your awesome work and being willing to explore the better vs the easier path with me. Apologies on the delay for getting this merged and released, I had a lot going on during that time frame and things should be back to whatever "normal" means. I have cut a release and when it makes it through the CI/CD pipeline and creates an artifact I will update here. |
Thanks for your assistance @majormoses, it was a pleasure working through it with you 😄 |
…ture Net::HTTP errors
Pull Request Checklist
Related to comments in #82
General
- [ ] Update README with any necessary configuration snippets- [ ] Binstubs are created if neededRuboCop passes
Existing tests pass
Purpose
Improve error reporting related to Net::HTTP timeouts