-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod terminated with OOMKilled #801
Comments
@wdonne I assume you used the monolithic provider-aws package rather than the family provider ones? I'll close this issue for now, but feel free to revert back if you still run into issues. |
Hi, resources:
limits:
memory: 4Gi
requests:
cpu: 500m
memory: 1Gi |
I have switched to the AWS family providers, but my |
We have 10 cores 12GB for aws-ec2 alone and it's not enough:
@wdonne, would you mind re-opening the issue? Update: controllerConfig arguments:
|
@gmykhailiuta I don't have the permission to reopen the issue because I didn't close if myself. @jeanduplessis Would you reopen it? I think there clearly is a memory problem. |
@wdonne, @siddharthdeshmukh, @gmykhailiuta I cannot reproduce the issue in v0.40.0 with the information provided. Does this issue always occur, and which versions are you using? and provide us the full |
Thank you for taking care of it, @turkenf ! This issue seems to get worse the more resources we migrate to Upbound providers (ec2, iam, eks, route53). Ec2 is currently one of the most heavily used. It manages 238 resources. Could you try with 250+ resources please? I've also noticed that most load comes in the first few minutes following provider pod's creation like if all resources are polled at once, then it gets more even. It would be nice to introduce a random delay per each resource. ControllerConfig spec used by aws-ec2 provider:
Versions in use:
|
Hello @turkenf , This is the ControllerConfig resource:
and this is the live manifest of the IAM provider pod:
Note also that the provider currently manages only 19 resources. |
I now upgraded to version 0.40.0 and the problem still occurs. |
@wdonne in your case, I think the error you are getting is due to your CPU limit. Since a low CPU value causes more memory consumption, your pod is terminated with OOMKilled.
Can you try again by increasing the CPU and let us know? |
@gmykhailiuta I couldn't test your situation, but I think it can be solved by trying different variations with limits. I also recommend you to check here for more detailed information about the tests performed. Please test it in the latest provider version and If you still think there is a problem, share the results with us in detail. |
@turkenf I first gave it 1 CPU and then 2. In both cases it was OOMKilled after a few seconds, so much faster. |
@wdonne could you please share us provider's log(with debug logs enabled) and output of |
@turkenf How can I set the loglevel to debug? This is the log I have so far:
And this is the output of
|
Hi, apiVersion: pkg.crossplane.io/v1alpha1
kind: ControllerConfig
metadata:
name: upbound-provider-aws-ec2-controller-config
spec:
replicas: 1
resources:
limits:
memory: 8Gi
requests:
cpu: 2000m
memory: 4Gi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: false
runAsNonRoot: true
serviceAccountName: upbound-provider-aws-ec2-sa But now I am facing another issue where We also ended up using more resources on the node when we migrated to provider family instead of monolith provider. In our case currently we are using following providers
|
Seeing the same with the IAM aws family provider v0.41.0 |
This provider repo does not have enough maintainers to address every issue. Since there has been no activity in the last 90 days it is now marked as |
This provider repo does not have enough maintainers to address every issue. Since there has been no activity in the last 90 days it is now marked as |
This issue is being closed since there has been no activity for 14 days since marking it as |
What happened?
The provider pod is often terminated with OOMKilled.
How can we reproduce it?
Run it with the memory limit set to 1Gi.
What environment did it happen in?
Remarks
By default the resources object is empty in the pod. What is the amount memory needed by the provider?
The text was updated successfully, but these errors were encountered: