Replies: 1 comment 1 reply
-
I would expect most of these accesses to work from a |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
🔍 Context
The standard Node Problem Detector (NPD) is a key component of Kubernetes clusters that focus on minimizing service disruption through deep health monitoring of the underlying node operating system.
NPD is enabled by default in:
This suggests that it is a mature software component that should equally be viable for AWS EKS. However, AWS EKS does not offer this out of the box.
On AWS, one might assume that installing the Node Termination Handler (NTH) would be sufficient and make NPD unnecessary. However, as we've discovered this is insufficient for many scenarios in which AWS maintenance event is not published. It turns out that NTH and NPD are complementary, with different focus areas, strengths, and weaknesses.
When one attempts to self-manage NPD on Bottlerocket and EKS, it is a struggle to get it working correctly due to challenges like the inability to install on directly on the host (such as AKS and GKE do) and SELinux policy constraints when running as a DaemonSet. Furthermore, privileged access is required to interact with host domain sockets and related resources.
💡 Idea
This idea proposes giving standard guidance to the successful installation and management of NPD in a manner that limits the security risks associated with its need to interact with components such as
systemd
,journalctl
and the container runtime. If there are upstream changes that are needed to make this more secure, these can be suggested and worked on.Another possibility is a Bottlerocket package addition that can be run without the need of privileged access. This is facilitated via NPD's support for Custom Plugin Monitors. The idea would be to develop an interface that could be used to get the results of queries such as these:
show containerd --property=InactiveExitTimestamp
crictl --runtime-endpoint=unix:///var/run/containerd/containerd.sock pods --latest
journalctl --unit <service> --since <time>
Thank you community for your consideration! 🙇
Beta Was this translation helpful? Give feedback.
All reactions