mmod_human_face_detector auto fallback from CUDA to CPU #3044

ha7sh17 · 2025-01-16T10:57:53Z

ha7sh17
Jan 16, 2025

dlib ver : 19.24
server : NVIDIA L4

We are successfully using the mmod_human_face_detector in two modes by building dlib with DLIB_USE_CUDA set to ON and OFF for the CUDA and CPU versions, respectively.

In the CUDA version, when GPU resources are stable, the detector always returns consistent results. However, when GPU resources are constrained (e.g., SM usage approaches 100), the returned rect coordinates and confidence values occasionally differ from the usual results. Interestingly, these "different" rect coordinates match exactly with those returned by the CPU version of the mmod_human_face_detector. (In such cases, the confidence values in the CUDA version are slightly different from those in the CPU version.)

Given this behavior, does dlib—even when built with DLIB_USE_CUDA=ON—have an automatic fallback mechanism to use the CPU for calculations during runtime when GPU resources are insufficient?

Answered by davisking

Jan 16, 2025

It's because changing the order in which you add floating point numbers slightly changes the resulting value. And in cuda there are a ton of threads doing just that. So when something changes the way those threads get scheduled you will sometimes get slightly different results.

View full answer

arrufat · 2025-01-16T11:11:07Z

arrufat
Jan 16, 2025

Currently, if you compile dlib with DLIB_USE_CUDA=ON, all the dnn parts will use the GPU, there's no way to change the backend at runtime.

7 replies

arrufat Jan 16, 2025

Honestly, I don't know, I've never experienced such a thing... I don't know enough about how the NVIDIA runtime works.

ha7sh17 Jan 16, 2025
Author

Ok.
Thank you for your kind support.
I will check in my side, more

arrufat Jan 16, 2025

Let us know if you find what is it.

davisking Jan 16, 2025
Maintainer

It's because changing the order in which you add floating point numbers slightly changes the resulting value. And in cuda there are a ton of threads doing just that. So when something changes the way those threads get scheduled you will sometimes get slightly different results.

Answer selected by ha7sh17

ha7sh17 Jan 16, 2025
Author

@davisking
Thank you for your advice.
We will check CUDA floating point calculation mechanism deeply to avoid getting different results.
If we success, I will share our solution here.

davisking Jan 16, 2025
Maintainer

It's not something you can change. Just how the cuda code being used here works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mmod_human_face_detector auto fallback from CUDA to CPU #3044

{{title}}

Replies: 1 comment 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

mmod_human_face_detector auto fallback from CUDA to CPU #3044

ha7sh17 Jan 16, 2025

Replies: 1 comment · 7 replies

arrufat Jan 16, 2025

arrufat Jan 16, 2025

ha7sh17 Jan 16, 2025 Author

arrufat Jan 16, 2025

davisking Jan 16, 2025 Maintainer

ha7sh17 Jan 16, 2025 Author

davisking Jan 16, 2025 Maintainer

ha7sh17
Jan 16, 2025

Replies: 1 comment 7 replies

arrufat
Jan 16, 2025

ha7sh17 Jan 16, 2025
Author

davisking Jan 16, 2025
Maintainer

ha7sh17 Jan 16, 2025
Author

davisking Jan 16, 2025
Maintainer