Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvmlDeviceGetMemoryInfo, Insufficient Permissions #842

Open
DrAuYueng opened this issue Jan 2, 2025 · 2 comments
Open

nvmlDeviceGetMemoryInfo, Insufficient Permissions #842

DrAuYueng opened this issue Jan 2, 2025 · 2 comments

Comments

@DrAuYueng
Copy link

The running container uses the MIG device. Use the API to obtain MemoryInfo in the container. Sample code:

>>>import pynvml
>>> pynvml.nvmlInit()
>>> handle = pynvml.nvmlDeviceGetHandleByIndex(0)
>>> pynvml.nvmlDeviceGetMemoryInfo(handle)

Error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/pynvml.py", line 2934, in nvmlDeviceGetMemoryInfo
    _nvmlCheckReturn(ret)
  File "/usr/local/lib/python3.10/dist-packages/pynvml.py", line 979, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.NVMLError_NoPermission: Insufficient Permissions

Execute nvidia-smi -L in the container, the result is:
Image

How to deal with this situation?

@frosk1
Copy link

frosk1 commented Jan 14, 2025

same issue

@riccardo32
Copy link

same issue

Further details: Servers are running Flatcar OS and using the latest release of container-toolkit v1.17.3 and Production branch driver 550.127.08.

Issue is easily reproducible on H100 and A100 MIG devices, but works fine if device is not partitioned.

Please advise if you need any additional information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants