You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have the python exporter push gpu metrics upstream to mimir.
We can use imports from the (nvidia-maintained DCGM exporter)[https://github.com/NVIDIA/dcgm-exporter/tree/main] as appropriate but that is not necessary at this stage. The DCGM exporter is, among other things, fundamentally designed to be deployed into a cluster, and parts of its design may not work well for our use case without substantial work.
As a fallback implementation, we can run nvidia-smi pmon
or nvidia-smi dmon
to get good metrics for both processes and devices, respectively, and parse those out into prometheus metrics.
The text was updated successfully, but these errors were encountered:
Have the python exporter push gpu metrics upstream to mimir.
We can use imports from the (nvidia-maintained DCGM exporter)[https://github.com/NVIDIA/dcgm-exporter/tree/main] as appropriate but that is not necessary at this stage. The DCGM exporter is, among other things, fundamentally designed to be deployed into a cluster, and parts of its design may not work well for our use case without substantial work.
As a fallback implementation, we can run
nvidia-smi pmon
or
nvidia-smi dmon
to get good metrics for both processes and devices, respectively, and parse those out into prometheus metrics.
The text was updated successfully, but these errors were encountered: