Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtualized /proc/cpuinfo via LXCFS not working #1070

Closed
PhilippWendler opened this issue Aug 7, 2024 · 2 comments · Fixed by #1072
Closed

Virtualized /proc/cpuinfo via LXCFS not working #1070

PhilippWendler opened this issue Aug 7, 2024 · 2 comments · Fixed by #1072
Labels
bug container related to container mode

Comments

@PhilippWendler
Copy link
Member

PhilippWendler commented Aug 7, 2024

If I start something with core limit, e.g., runexec --cores 0 ... and have LXCFS installed, I expect /proc/cpuinfo to only contain the allowed cores. However, on my system I get the full list of cores. This should be investigated and fixed if possible.

findmnt does show our mounts of LXCFS files over /proc, and /var/lib/lxcfs/proc/cpuinfo also shows the full amount of cores, so the problem is not related to our mounts.

Potentially lxcfs needs to be started in a different cgroup or so.

@PhilippWendler PhilippWendler added bug container related to container mode labels Aug 7, 2024
@PhilippWendler
Copy link
Member Author

What happens is the following: When LXCFS receives a read request to the virtualized /proc/cpuinfo, it needs to find out the cgroup from which it should get the core limits that apply. I assumed that it uses the cgroup of the process that reads cpuinfo, but in fact it uses the cgroup of the init process of the PID namespace that the reading process is in. In BenchExec's case, the init process of the container is not in the same cgroup as the processes in the container, because we do not want to measure the resource consumption of the init process. So our init process has no core limit, which is what LXCFS reports.

In order to fix this we need to put the init process into a cgroup that has the same core limit as the cgroup of the container, but is still outside the cgroup that we use for measurements.

@PhilippWendler PhilippWendler changed the title LXCFS integration not working? LXCFS integration not working on systems with cgroups v2 Aug 8, 2024
@PhilippWendler PhilippWendler changed the title LXCFS integration not working on systems with cgroups v2 Virtualized /proc/cpuinfo via LXCFS not working Aug 8, 2024
@PhilippWendler
Copy link
Member Author

The behavior of LXCFS is actually the same no matter what cgroup version we have.

However, only the files that virtualize the (potentially limited) system resources like /proc/cpuinfo and /proc/meminfo where affected: The virtualization of /proc/uptime was always working (and tested) because it does not rely on the cgroup of the init process.

I tried to find out when LXCFS started to use the cgroup of the init process of the container, but was not successful. It seems that it has done so basically forever, which would mean that virtualization of /proc/cpuinfo was never working inside BenchExec containers. Luckily we at least never explicitly promised this, we only recommended LXCFS to our users because of virtualized /proc/uptime. But users who were already familiar with the feature set of LXCFS could have expected to have a virtualized /proc/cpuinfo in BenchExec nevertheless.

PhilippWendler added a commit that referenced this issue Aug 8, 2024
We recommend to install LXCFS together with BenchExec,
because we use that to virtualize for example /proc/uptime in the container.
However, a main use case of LXCFS is to virtualize files
that contain information about the system such as the available CPU
cores in /proc/cpuinfo.
We never advertised this, but I assumed this was working all the time.
I found out that it never worked, though.

The reason is that LXCFS is using the limits configured for the init
process of the container, but our init process has no limits,
it is not part of the same cgroup as the other processes in the container
(on purpose, because we do not want to measure its resource consumption).
So now we create yet another cgroup for the init process
that is below the one with the limits
but outside of the one that is used for measurements.
Note: A single runexec execution will now create up to 5 cgroups.

This is made possible due to the separation between the cgroups
for limits and for measurements in the last commits.

With this change, /proc/cpuinfo now shows only the cores available in the
container if LXCFS is running.
This helps processes in the container to see how many CPU cores
they are allowed to use and for example to decide how many threads to spawn.

Fixes #1070
PhilippWendler added a commit that referenced this issue Aug 8, 2024
We recommend to install LXCFS together with BenchExec,
because we use that to virtualize for example /proc/uptime in the container.
However, a main use case of LXCFS is to virtualize files
that contain information about the system such as the available CPU
cores in /proc/cpuinfo.
We never advertised this, but I assumed this was working all the time.
I found out that it never worked, though.

The reason is that LXCFS is using the limits configured for the init
process of the container, but our init process has no limits,
it is not part of the same cgroup as the other processes in the container
(on purpose, because we do not want to measure its resource consumption).
So now we create yet another cgroup for the init process
that is below the one with the limits
but outside of the one that is used for measurements.
Note: A single runexec execution will now create up to 5 cgroups.

This is made possible due to the separation between the cgroups
for limits and for measurements in the last commits.

With this change, /proc/cpuinfo now shows only the cores available in the
container if LXCFS is running.
This helps processes in the container to see how many CPU cores
they are allowed to use and for example to decide how many threads to spawn.

Fixes #1070
@PhilippWendler PhilippWendler linked a pull request Aug 19, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug container related to container mode
Development

Successfully merging a pull request may close this issue.

1 participant