-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] How to assume IAM role inside the escalator pod? Getting 403 despite instructions #231
Comments
Thanks for giving Escalator a go @FilipSwiatczak! Based on the following error:
I'd say the trust relationship isn't setup correctly between the two roles to allow Have a look at this page on how to allow a role to assume another role - https://nelson.cloud/aws-iam-allowing-a-role-to-assume-another-role/, it has instructions on how to allow assuming a role either in the same account or in a different account. |
I'd also like to mention that documentation on how to configure a role to another assume role is going to be missing from our documentation as it will depend on the configuration of the end user's cluster/AWS accounts and we can't cater for all scenarios. |
thanks @awprice, it worked with these two changes:
So while this works, it's not fully automated as I can't find a way to fetch the sts role the pod starts under from the cluster. I've mostly raised this question to save other people time, to have a copy paste solution that would be as easy as the rest of instructions in the project Readme! |
Also @awprice if escalator runs in the same node group that it controls, how can it prevent tainting it's own node and forcing escalator re-deployment? Really can't find an answer in the docs!
I apologise if those are noobish questions, I'm not a kubernetes expert! (yet!) |
Using instance protection like:
also does not work and the Node is terminated after being tainted. Though if it did work it would probably leave escalator stuck trying to remove the node over and over. |
@FilipSwiatczak No problem!
We tend to use IAM roles for service accounts on EKS, as this will prevent the need to deal with node instance roles. This documentation from AWS gives a good introduction and steps to use them: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
We avoid this by running multiple node groups in our clusters and running Escalator on a node group that isn't being scaled up/down by Escalator to prevent Escalator terminating the node that it itself is running on. Escalator is primarily designed for scaling node groups that are running job-based workloads - so ones that will end. Escalator itself could be considered a service based workload - meaning that it will run forever. So it isn't really the sort of thing that should be run on the node groups that Escalator is scaling. |
thank you @awprice ! I've followed above link and at the very end of Pod checks realised the Escalator pod does not have AWS_WEB_IDENTITY_TOKEN_FILE set. |
It appears when the escalator is deployed in a separate node-group, with custom label escalator:worker at both node and pod level, escalator does't see any cpu or mem utilisation (0). It only works when it's in the same node group for me.
With this and the IAM injection issue I'm a bit stuck. Are there any more complete deployment examples in existence please? |
When escalator is attempting to scale node-group different to one it's deployed in, it throws:
|
@FilipSwiatczak Some answers to your questions:
|
Hello guys,
It's a wonderful project and I've almost got it working. Having followed Readme instructions in (https://github.com/atlassian/escalator/blob/master/docs/deployment/aws/README.md)
I have those ticked off:
Given all that I'm still getting 403 on attempt to assume role.
AccessDenied: User: arn:aws:sts::XXX:assumed-role/eksctl-bitbucketpipelines-nodegro-NodeInstanceRole-XXX is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::XXX:role/bitbucket-pipelines-escalator-role\n\tstatus code: 403
Any pointers would be much appreciated. Thank you!
The text was updated successfully, but these errors were encountered: