Replies: 3 comments
-
Hi @cyrus-mc , thanks for raising this concern. This seems better suited as a Github issue rather than a discussion. Can you please open an issue with the following information:
With regard to not being able to connect to the node to see logs, you can try using the
|
Beta Was this translation helpful? Give feedback.
-
Do you also have the other host metrics like memory consumption? Trying to see if this is relevant issue #4075 |
Beta Was this translation helpful? Give feedback.
-
Closing and continuing discussion in #4075. |
Beta Was this translation helpful? Give feedback.
-
I am experiencing issues where a node goes offline (Kubelet NotReady). This cluster and nodes are part of our CI/CD system which is used to run self-hosted GitHub runners. Troubleshooting is complicated by the fact that when the node gets into this state, I am unable to connect to it via SSM. AWS cloudwatch metrics show that the CPU is pegged at 100%, and the last logs messages that make it out of the system and to our log aggregation system are:
Given I am unable to connect to the machine in anyway I am not sure how to go about discovering what is causing the instance to die.
Beta Was this translation helpful? Give feedback.
All reactions