-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ES API Keys for client operations #1067
Comments
This will be simplified greatly once we have universal API keys i.e. cloud API keys can be mapped to roles in ES clusters. This avoid us having to create anything. |
I see two potential modes here:
|
Do we have any idea for the permission set to be used for each key? If we do not define a fine grained permission set, the key will default to the same auth as the rally basic auth user. If we do not care about exercising the fine grained access control, it is definitely easiest to let every key default to the same permissions the http auth user has when creating it.
|
Given Rally is in most cases running in a privileged mode and should never be used against a production cluster, i'm personally ok with the token defaulting to the same permissions as the rally basic auth user. I can't see any cases where we would want to test restricted permissions unless we believe it has the potential to impact performance. Even if the latter does become true, this feels like an enhancement that can be addressed in later iterations. |
This commit adds a optional client arg, use_api_key, which will generate and use an api key for every client and async client created by Rally. It uses an initial call via the existing auth scheme to ES to generate the key, and then creates a new client using the api_key for general use by Rally. The async client also does this, and temporarily creates a sync client to generate the key, and then uses it when creating an async client. Closes elastic#1067
Please refer to these definitions for terms that I use below:
I agree with #1067 (comment) that we need two modes but I think we should first implement the mode where each client uses its own API key as this more relevant for realistic benchmarks. @hub-cap and me had a chat about this already and the implementation in #1098 covers a scenario where each worker process (potentially simulating multiple clients) uses its own API key, i.e. we neither use shared key for all clients nor does each client use its own key. In order to implement this, we need to consider the following aspects:
async def perform_request(self, method, url, params=None, body=None, timeout=None, ignore=(), headers=None):
final_headers = dict(headers)
final_headers["authorization"] = self._get_api_key_header_val(api_key=retrieve_api_key_for_current_client())
return await super().perform_request(method, url, params, body, timeout, ignore, final_headers) |
Supporting per-request |
Any update on this issue ? It could be great to use ApiKey authentication to use Rally against our ES cluster. |
Hi @YohanSciubukgian , I asked around internally; sorry, there's hasn't been any work done here yet. Once this gets traction the ticket will be updated Thanks |
The upstream issue blocking this feature has been implemented in elastic/elasticsearch-py#1577 and will be available in |
API key authentication should also be supported for the ES-based metrics store. |
I'm working on this at the moment and wanted to separate some concerns and clarify scope. As initially proposed and later reiterated, the primary motivation is supporting a unique API key per simulated client. Secondarily, we should support a global, user-provided API key across all clients. I agree that we should also support API key authentication for the metrics store as suggested, but this is a bit orthogonal and requires changes to completely different code paths, so we can tackle that separately. I'll file a separate issue for this. Since the primary use case entails Rally creating an API key per simulated client, it will need to be provided with credentials sufficient for calling the relevant Elasticsearch APIs for creating API keys and I'd argue also invalidating/deleting them as well upon benchmark completion. This results in 3 different scenarios that we'll need to support:
To support these scenarios, we need to introduce two new client options:
Here's how each of the above scenarios would be configured using the two new client options:
We can hash out the details around naming on the PR, but I'll mention here that Happy to hear any feedback on this approach! Otherwise, the implementation will largely follow the broad strokes outlined in this comment. We'll aim to support scenario 1 first (generated API keys for client operations, basic auth otherwise) given that it's highest priority. |
+1 for this approach.
API keys created with basic authentication inherit permissions of the authenticated user, though they can be created with optional role descriptors (as mentioned in item 2 from the docs). To be clear, the intention as stated in #1067 (comment) is to create unique API keys per simulated client from a single authenticated user without additional or unique role descriptors? +1 to deleting generated API keys, but preserving the |
Yeah, that's what I'm thinking, at least for a first implementation. It's worth looking into how this is done by default by Agent, though, since the intention is to mimic its behavior. Good point.
I could see this maybe being useful if it turns out that API key generation for a large number of clients is painfully slow, but it seems like a stretch. Did you have other scenarios in mind where this would be advantageous? It shouldn't be difficult to support this, but my leaning is to file it away as a future enhancement unless we foresee a compelling need for it. What do you think? |
The approach sounds good to me too. I believe we should first focus on the first scenario in the table: generating per-client API keys with basic auth. I think it will provide 95% of the value since anything else won't be in the hot path. |
We can let this rest and see if a need surfaces. I do not see any compelling reason to add it now. |
100% agree. That's the plan! |
I could use some input on a core implementation detail that I've been thinking about and have discussed a bit with @DJRickyB. In particular, I'd love to hear your thoughts @pquentin. As discussed previously, a key implementation detail is that a client should be able to pass its unique API key to the Elasticsearch client transparently. This implies request-level overrides of the authentication mechanism set on the Elasticsearch client instance. As noted, My initial thinking was to rely on this mechanism. This ultimately requires that we make runners aware of the API key associated with its caller and ensure that it properly passes the This works, but it does mean that runners and parameter sources would now need to concern themselves with authentication. For default runners in the Rally code, maybe this is okay, but this could be rather trappy for authors of custom runners and/or parameter sources. They would need to make sure that that Ideally, runners shouldn't need to care about ES client options such as authentication. They currently don't need to, but per-request auth overrides as required here complicate matters. There's probably all sorts of ways that we could approach this within Rally. But! Conveniently, This seems like exactly what we need. Instead of requiring that runners pass the Done in this manner, I think all runners--including custom ones that Rally doesn't control--should Just Work(TM) with per-client API keys. It seems like the right separation of concerns to me. The catch is that we'd need to update the elasticsearch-py dependency to at least 8.0.0, which will require some unknown but certainly non-zero amount of work due to deprecations and such. I'm also not sure if creating an ES client instance per Rally client would cause any issues with a large number of clients and introduce a potential bottleneck within the load driver. I'd think and hope not, but we'd need to verify. We could probably hack this general approach into Rally itself using the Assuming, of course, that this idea doesn't have a fatal flaw that I haven't considered. 😄 Hence my request for feedback. Thanks! |
💯 This is a great, and I absolutely agree that this separate concerns nicely. Regarding upgrading the Elasticsearch Python client to 8.0, this is the right thing to do and long overdue, so it would be nice if we get it done, but I'm afraid "non-zero amount of work" is an understatement - this sounds like the same amount of work as this issue, if not more. Maybe you can try to work on it for a timeboxed period (1 week?) and if it turns out to be too much work, we can instead reimplement |
I had a hunch it might be pretty involved, but hadn't yet looked in depth, so thanks for confirming that.
This seems reasonable to me. |
The Elastic agent uses API keys to communicate with ES. This is the recommended way for clients to communicate with ES. We should do this by default. For
benchmark-only
this is trivial. For cases where we orchestrate ES we will need to create this API key using the creds.@danielmitterdorfer @dliappis
The text was updated successfully, but these errors were encountered: