-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally use generated api_key auth to ES #1098
Conversation
This commit adds a optional client arg, use_api_key, which will generate and use an api key for every client and async client created by Rally. It uses an initial call via the existing auth scheme to ES to generate the key, and then creates a new client using the api_key for general use by Rally. The async client also does this, and temporarily creates a sync client to generate the key, and then uses it when creating an async client. Closes elastic#1067
@danielmitterdorfer I wanted to get an eyeball from you before I go down the path of unit/integration tests, just to ensure that this is the correct approach. Mainly the async client work. TY in advance! The easiest way to test this is with elastic/rally-teams#57, which updates the audit log to emit the key id/name. After installing/starting a 7.9.3 ES instance using the local PR of rally teams, I used the following client options to test.
The only audit log messages that did not contain the apikey (
All other events in the audit log were associated w/ a key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the proposal and the great and detailed reproduction scenario. :)
I left a few notes and suggestions.
esrally/client.py
Outdated
# the reason the options were not removed in place was because of the logic to coalesce the key tuple into a | ||
# header that the client could use. | ||
api_key_response = instance.security.create_api_key({"name": "rally-api-key"}) | ||
self.client_options.pop("http_auth") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it possible to use API keys without prior basic auth but pop
raises a KeyError
when the key is not present, so we should maybe do self.client_options.pop("http_auth", None)
here, see docs.
@@ -129,7 +129,17 @@ def _is_set(self, client_opts, k): | |||
|
|||
def create(self): | |||
import elasticsearch | |||
return elasticsearch.Elasticsearch(hosts=self.hosts, ssl_context=self.ssl_context, **self.client_options) | |||
instance = elasticsearch.Elasticsearch(hosts=self.hosts, ssl_context=self.ssl_context, **self.client_options) | |||
if self._is_set(self.client_options, "use_api_key"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also enter the block if the user specified use_api_key:false
on the client options. I noticed that we fall into this trap in other places in this class as well and I propose we change this line to:
if self.client_options.get("use_api_key", False):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That method is slightly misleading. It does not check if its set, it just removes the possibility of a key error. It returns the value, if present. Adding a test like so confirms this (and passes).
def test_is_set(self):
hosts = [{"host": "127.0.0.1", "port": 9200}]
client_options = {}
# make a copy so we can verify later that the factory did not modify it
original_client_options = dict(client_options)
f = client.EsClientFactory(hosts, client_options)
opts = {"foo": True, "bar": False}
assert f._is_set(opts, "foo") is True
assert f._is_set(opts, "bar") is False
assert f._is_set(opts, "baz") is False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the semantics of the predicate is_set
are misleading and maybe we should change it. I still propose we implement this line here differently:
if self.client_options.get("use_api_key", False):
This ensures correct behavior depending on the value of use_api_key
.
esrally/client.py
Outdated
@@ -139,6 +149,10 @@ def create_async(self): | |||
|
|||
from elasticsearch.serializer import JSONSerializer | |||
|
|||
if self._is_set(self.client_options, "use_api_key"): | |||
# Temporarily create a non async version of the client to generate an api_key and put it into client_options | |||
self.create() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would work fine but how about we structure this a bit differently and instead encapsulate client option post processing into a dedicated method? Here is a summary of all my proposals (including earlier review comments:
diff --git a/esrally/client.py b/esrally/client.py
index ff92b75..ce76ccb 100644
--- a/esrally/client.py
+++ b/esrally/client.py
@@ -128,6 +128,22 @@ class EsClientFactory:
return False
def create(self):
+ self._postprocess_client_options()
+ return self._create_sync_client()
+
+ def _postprocess_client_options(self):
+ if self.client_options.get("use_api_key", False):
+ es = None
+ try:
+ es = self._create_sync_client()
+ api_key_response = es.security.create_api_key({"name": "rally-api-key"})
+ self.client_options.pop("http_auth", None)
+ self.client_options["api_key"] = (api_key_response["id"], api_key_response["api_key"])
+ finally:
+ if es:
+ es.close()
+
+ def _create_sync_client(self):
import elasticsearch
return elasticsearch.Elasticsearch(hosts=self.hosts, ssl_context=self.ssl_context, **self.client_options)
@@ -163,6 +179,7 @@ class EsClientFactory:
# ensure that we also stop the timer when a request "ends" with an exception (e.g. a timeout)
trace_config.on_request_exception.append(on_request_end)
+ self._postprocess_client_options()
# override the builtin JSON serializer
self.client_options["serializer"] = LazyJSONSerializer()
self.client_options["trace_config"] = trace_config
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach better 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spotted two architectural issues that we need to solve otherwise. Let's discuss this in a high-bandwidth conversation.
self._generate_api_key() | ||
|
||
def _generate_api_key(self): | ||
with self._create_sync_client() as instance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya i figured you would like that ;)
@@ -129,7 +129,17 @@ def _is_set(self, client_opts, k): | |||
|
|||
def create(self): | |||
import elasticsearch | |||
return elasticsearch.Elasticsearch(hosts=self.hosts, ssl_context=self.ssl_context, **self.client_options) | |||
instance = elasticsearch.Elasticsearch(hosts=self.hosts, ssl_context=self.ssl_context, **self.client_options) | |||
if self._is_set(self.client_options, "use_api_key"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the semantics of the predicate is_set
are misleading and maybe we should change it. I still propose we implement this line here differently:
if self.client_options.get("use_api_key", False):
This ensures correct behavior depending on the value of use_api_key
.
return elasticsearch.Elasticsearch(hosts=self.hosts, ssl_context=self.ssl_context, **self.client_options) | ||
|
||
def create(self): | ||
self._postprocess_client_options() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two issues with this that I missed when first eyeballing your draft PR:
- This circumvents the logic to wait for the REST layer to be available. This will work if you spin up Elasticsearch yourself because by the time you start Rally, Elasticsearch is likely up, but it won't work when Rally spins up the cluster. The reason is that we first create the client, then use this client to wait for the REST API layer and only then continue with the benchmark (see
Driver#prepare_benchmark()
for details). Reproduction scenario:esrally --distribution-version=7.9.3 --car=defaults,trial-license,x-pack-security --client-options="use_ssl:true,verify_certs:false,basic_auth_user:'rally',basic_auth_password:'rally-password',use_api_key:true"
- This will only create one API key per process but not per simulated client. Say, you start a bulk indexing task on an 8 core machine with 256 clients. Rally will start 8 load generator processes (because of the 8 cores) and each of them will retrieve an API key. Each of the load generator processes will simulate 256/8 = 32 clients and each of these clients will use the same API key. However, a user will likely expect that each client uses their own API key (i.e. retrieve 256 API keys).
Given these two issues I sense that we need a different approach. One option would be to insert a dedicated task at the beginning of a schedule but I sense that's the wrong approach either because the track should be agnostic to how clients authenticate. Maybe let's discuss this on a high-bandwidth medium.
@hub-cap I left a comment with aspects to consider for an implementation on the related issue. I think the implementation that we will end up with, will be quite different from this PR so I suggest we close it. Wdyt? |
100% agree w/ closing |
This commit adds a optional client arg, use_api_key, which will generate
and use an api key for every client and async client created by
Rally. It uses an initial call via the existing auth scheme to ES to
generate the key, and then creates a new client using the api_key for
general use by Rally.
The async client also does this, and temporarily creates a sync client
to generate the key, and then uses it when creating an async client.
Closes #1067