Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Also check /usr/bin/ path for nvidia cards e.g. in WSL #18

Merged
merged 1 commit into from
Dec 3, 2024

Conversation

RJKeevil
Copy link
Contributor

@RJKeevil RJKeevil commented Dec 2, 2024

Use an additional path to search for nvidia GPUs

@janpfeifer
Copy link
Contributor

So this function is called if a cuda PJRT plugin was found, but it is not sure if there is an actual GPU card installed.

The use case is: a demo docker with all the PJRTs installed shouldn't attempt to run a cuda PJRT if it is running on a computer with no GPUs.

Looking at /usr/bin/nvidia will only detect that the nvidia programs are installed, not whether there is an actual GPU card installed.

Now looking at /dev/nvidia* seems not to be fail proof either ... Let's chat later maybe we could look at:

  • ls -ld /sys/module/nvidia*
  • ls -ld /sys/bus/pci/drivers/nvidia*

Is any of those not empty in your container set up ?

In the meantime, I'm adding documentation and logging to the function, including logging of the work around: providing the absolute path to the cuda pjrt. See #19

@RJKeevil
Copy link
Contributor Author

RJKeevil commented Dec 3, 2024

Both of these paths are empty, i think theres some wizardry with Docker Desktop and WSL2 where the container somehow just delegates to the cuda drivers on the host OS. nvidia-smi is added to path, perhaps issuing that command is a reasonably generic way to see if Cuda is present in a system?

@janpfeifer
Copy link
Contributor

I hesitate making the test depend on the installation of nvidia-smi. For instance, the demo docker doesn't contain it, even though it works with NVidia CUDA. Also, I'm not sure about distribution rights of these nvidia tools. The legalese is not clear to me ... but maybe it's an option. Let me search around for alternatives in Windows WSL.

@RJKeevil
Copy link
Contributor Author

RJKeevil commented Dec 3, 2024

Agreed, I dont think it should depend on it; the current check could still look for nvidia files but calling nvidia-smi could be a fallback for this case? I've looked further in the container, only other evidence I can find for the presence of cuda is the presence of /usr/lib/wsl/drivers/nv_dispi.inf_amd64_adf5a840df867035

@janpfeifer
Copy link
Contributor

@janpfeifer
Copy link
Contributor

Yes, checking if nvidia-smi is available and then executing it to check is a very viable option. Do you want to implement that ?

@janpfeifer janpfeifer merged commit c8c81d9 into gomlx:main Dec 3, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants