The recurring problem of ... Unable to load DLL 'onnxruntime' #15309

vpenades · 2023-03-31T15:41:16Z

vpenades
Mar 31, 2023

This error will be familiar to some people here:

System.TypeInitializationException: The type initializer for 'Microsoft.ML.OnnxRuntime.NativeMethods' threw an exception.
 ---> System.DllNotFoundException: Unable to load DLL 'onnxruntime' or one of its dependencies: No se puede encontrar el módulo especificado. (0x8007007E)
   at Microsoft.ML.OnnxRuntime.NativeMethods.OrtGetApiBase()
   at Microsoft.ML.OnnxRuntime.NativeMethods..cctor()
   --- End of inner exception stack trace ---
   at Microsoft.ML.OnnxRuntime.OrtEnv..ctor()
   at Microsoft.ML.OnnxRuntime.OrtEnv.<>c.<.cctor>b__16_0()
   at System.Lazy`1.ViaFactory(LazyThreadSafetyMode mode)
   at System.Lazy`1.ExecutionAndPublication(LazyHelper executionAndPublication, Boolean useDefaultConstructor)
   at System.Lazy`1.CreateValue()
   at System.Lazy`1.get_Value()
   at Microsoft.ML.OnnxRuntime.OrtEnv.Instance()
   at Rehametrics.MediaPipe.BlazePoseDetector.RegisterOnnxProviders()

We thought we had this issue resolved by installing VCRedist, but we're still getting this error in some client's machines , and the issue looks similar to #13744

Now, as I can see, this is a recurring problem that keeps popping time and again: #116, #5449, #9260, #11230, #13744

At least two possible causes for this error have been identified:

Missing installation of: VCRedist runtimes
onnxrutime.dll being loaded from Windows/System32 instead of local path.
Other reasons, like missing files not being deployed.

The problem is that after all, this is a ONNX internal dependency, and good practices say that whoever has a dependency is responsible to correctly load it, or at least, to give meaningful diagnostics information to help the end developer fix the problem.

After reading many of the threads, and from other conversations I had with developers here and in other MS departments, I have the impression many developers have a hard time understanding that OnnxRuntime is not limited to servers and needs to be executed in an average clueless end user's machine... Asking to install obscure dependencies is already extremely painful, and we have to go to great lengths to include it as part of our installer as seamlessly as possible.

Furthermore, OnnxRuntime is being used for medical applications that need to be installed in computers within hospital environments with extremely tight security. I can't, but I would love to share some chats I had with hospital IT personnel when I required them to install VC redist or to just send me additional log files

But the biggest problem is that if that exception happens in a client's machine, we get notice through Applicatios Insights logging, so we have no way to diagnose why it happened, and giving solutions that require full access to the affected machine is not useful at all.

So I would humbly ask OnnxRuntime developers to expend some time to improve the developers and end users experience regarding this issue in these areas:

validate all required files exist and dependencies installed, in order to provide meaningful errors.
when loading DLLs, ensure the right ones are loaded, otherwise provide meaningful errors.

pranavsharma · 2023-04-01T03:09:41Z

pranavsharma
Apr 1, 2023

Thanks for this well written report. The C/C++ runtime is not an internal dependency of ORT. Every application needs a runtime that allows it to run. Applications typically include this runtime when they're shipped/packaged. ORT is a low-level library. It's not intended for average end users to download. We assume that there'll be applications that provide a useful functionality using ORT and ship ORT along with them. It thus becomes incumbent on the applications to ensure all required dependencies are shipped along with ORT. These dependencies have been clearly identified and documented here. Having said that our error messages can definitely be improved. We'll add this to our backlog.

0 replies

vpenades · 2023-04-01T07:56:32Z

vpenades
Apr 1, 2023
Author

Early in the development we tried Cuda package, but we discarded it because in the real world it's impossible to ensure the right drivers and dependencies are properly installed, So we are using the DirectML as the safest option, and I think it's this one that requires vcredist and other dependencies.

@pranavsharma you're right that any required dependency needs to be installed by the application package.... but that only solves some problems. The other problem that has been identified is the ORT loading the dependecies from the wrong folder (system32), I am aware that there's ways to tell a process to explicitly load assemblies from a specific directory, and I think this should be done by the ORT, not by the end developer.

We agree that error messages need to be improved to help diagnose and narrow the source of any problem. This is specially important when the error happens in a client's machine and we get the error via telemetry logs.

0 replies

BengtGustafsson · 2023-04-06T10:43:13Z

BengtGustafsson
Apr 6, 2023

We are also desperately trying to be able to use onnx runtime in a medical application. We see huge obstacles in this area of knowing which files are needed to be distributed and where to put them on each end user machine to make sure they are the versions loaded even if there are other versions reachable in the path of the end user machine.

For the first part we need file lists of which files to include depending on platform and which providers are included. As we don't know what hardware (GPU brand etc.) the end user has we have to ship "the works". On Windows it is easiest to always use DirectML but as CUDA is far faster (last we measured on 1.13) we want to ship that too. On Linux the situation is worse as we have to include separate providers for each GPU make. Referring to a page where you're supposed to do an obscure (to a native C++ programmer) dotnet command to store an unknown set of file in an unknown location is not helpful at all when we have to provide a consistent file set to end users.

So far I have not found any ONNX runtime binaries which include all provider dlls in the same package, and I'm unsure if the onnxruntime.dll/so in the provided single-gpu-provider packages are exchangable or only support their respective providers, which would be useless when we need to provide multiple gpu-providers.

0 replies

BengtGustafsson · 2023-04-06T10:48:23Z

BengtGustafsson
Apr 6, 2023

As for the second part where applications may load dlls from the wrong place, getting old ones that came with the OS or Windows Update or some other random application using some random version of ONNX runtime there are problems related to the nested loading of provider libraries and their dependent libraries that I don't know if they are at all solvable unless onnx runtime's cmake system allows setting RPATH (on linux) from the build command line or similar.

0 replies

BengtGustafsson · 2023-04-06T10:52:22Z

BengtGustafsson
Apr 6, 2023

Finally, on the side of error message improvements it seems that if you append a number of provider settings to for instance get CUDA provider if there is a nvidia GPU, get DirectML if there is an AMD or get CPU if none you can't see what you actually got. We're really worried that our customers may miss copying some of the cuDNN related libraries and run on CPU without knowing it (except complaining about slowness) so we really want to be able to know where it runs and why it couldn't run on more preferred providers.
While we could try creating sessions with GPU providers one at a time this would be a pity now that you have implemented the "append" based priortiy system, and also, we can not know if we're actually running on the GPU or if CPU emulation kicked in.

0 replies

vpenades · 2023-04-06T13:42:08Z

vpenades
Apr 6, 2023
Author

I think there's a misundertanding.... at least from my side, I am not requesting that OnnxRuntime installs all its dependencies for me.... but what I think it's responsability of OnnxRuntime is to give some way to get a clear Go / No Go for every ORT we want to use, without actually initializing it, so we can run some logic to choose the appropiate ORT, or give profer feedback to the user on what's missing and how to fix it.

Right now the current logic of running OnnxRuntime would be like this:

try {  TryIntializeCuda(); }
catch
{
    try { TryInitializeDirectML(); } // if cuda fails, fall back to DirectML
    catch
    {
        try { tryInitializeCPU(); } // if DirectML fails, fall back to CPU
        catch
        {
           Error("No ORT available"); 
        }
    }
}

My argument against this approach is that the initialization of the fallback ORTs rely on previous ORTs crashing, potentially leaving unmanaged garbage or leaving the system in an unstable state, which might affect or compromise the execution of the ORT that's been successfully intialized. Also, libraries that have been partially loaded cannot be unloaded. Also, as @BengtGustafsson stated, it may fall back to CPU in cases where it could be possible to fix and run on Cuda or DML.

A better approach would be this:

var cudaState = Orts.GetCudaORTDiagnostics();

if (cudaState == ReadyToRun)
{
    IntializeCuda();
    return;
}
else if (cudaState != Unavailable) // diagnose this ORT
{
    ApplicationInsights.RemoteLog(cudaState); // send telemetry so we can resolve the issue with client by PHONE CALL

    // diagnostics should give enough information to help resolve the problem:
    // - incompatible graphics card?
    // - missing drivers
    // - wrong Cuda drivers version?
    // - invalid context?  (missing files, wrong dll resolve paths, wrong file versions, who knows)
}

// fallback to DirectML if Cuda is unavailable

var dmlState = Orts.GetDirectMLDiagnostics();

if (dmlState == ReadyToRun)
{
    InitializeDML();
    return;
}
else if (dmlState != Unavailable) // diagnose this ORT
{
    ApplicationInsights.RemoteLog(dmlState); // send telemetry so we can resolve the issue with client by PHONE CALL

    // diagnostics should give enough information to help resolve the problem:
    // - incompatible graphics card?
    // - missing VC Redistributables?
    // - invalid context?  (missing files, wrong dll resolve paths, wrong file versions, who knows)
}

// fallback to CPU if DML is unavailable

InitializeCPU();

What's really important is that any Diagnostics state given by a given ORT should be divided in three groups:

Ready to go!
Something's missing, and can be easily fixed by average user (like installing VCRedist or graphics drivers)
Unavailable, skip to fallback ORT.

Notice that what prevents us to do that diagnostics process is we're, after all, end users of OnnxRuntime, and we don't have to know in detail about the requirements of a given ORT.

And this doesn't mean there's more that can be done from OnnxRuntime to mitigate the problem; I see DirectML as a fail safe, and it's important it's able to run in most circumstances.... and for that, requiring VCRedist to be installed is a hassle. It could be great to have a OnnxRuntime.DirectML.Minimal (requires VCredist) OnnxRuntime.DirectML.Full (statically linked VC runtime)

0 replies

BengtGustafsson · 2023-04-07T16:15:09Z

BengtGustafsson
Apr 7, 2023

I think we have similar needs here. Maybe I overreached when requiring file lists, but experience with cuDNN was that there are seemlingly unrelated dlls that are nevertheless needed for operation, and that it is non-trivial to figure it out and that it changed by cuDNN version. This said, such lists would probably best be provided by NVIDIA directly. As for the ORT provider dlls it seems fairly easy to figure out which files are needed.

As for vcredist this is unavoidable I think. The program can't even start if it is missing (these dlls invariably load implicitly) so there is no way for the program to report that they are missing.

1 reply

vpenades Apr 11, 2023
Author

I think we have similar needs here.

I agree.... the thing with requiring file lists might not be a good solutions because these file requirements might change over time as ORTs target newer versions of the drivers... and also the requirements are not limited to files, but also require checking hardware support.

For example, a Cuda ORT could easily diagnose that it can't run because there's no nVidia graphics device available, just to begin with, and I believe this kind of diagnostics need to be done by ORT developers who know what's under the hood.

About VCRedist, there's actually a solution, which is the ORT to statically link against the runtimes, that way there would be no need to use the VCRedist installation. I was told that OnnxRuntime preffers dynamically linking VCRedist because it allows security patches to be delivered. My impression is that there's two different environments:

Servers might preffer dynamic linking, due to security patches.
End user's machine installations would preffer static linking to avoid hassling the user with obscure installation dialogs.

snnn · 2023-04-14T18:48:19Z

snnn
Apr 14, 2023
Maintainer

Recently I come to hospital two or more days every week. I'd so glad to know ONNX Runtime can provide help there. I'm not sure what kind of diagnostics that could be added to the core onnxruntime.dll, would a separated diagnostic tool be helpful?

Statically linking against the VC runtimes only works in some cases. It would be problematic if onnxruntime also needs to load a custom op or dynamically loaded EP(e.g. CUDA). Because, when you static link VC Runtime, every DLL has its own heap. There will be more restrictions when we pass C++ objects across DLL boundaries.

Also, nowadays VC runtime is not a single piece. See this: https://devblogs.microsoft.com/cppblog/introducing-the-universal-crt/ . So, if your target environments are Windows 10 and above, I think it should be easy to do App-local deployment as mentioned in the article. I think it is more preferable than static linking.

2 replies

vpenades Apr 18, 2023
Author

would a separated diagnostic tool be helpful?

In a few cases it could have been helpful, in fact, we already developed such a tool tailored for our use case. But in most other
cases it's not helpful at all.

In hospital environments, they use to have an IT department, but there's huge differences between hospitals regarding the willingness
to help external contractors.

The most common scenario is clinicians willing to use our software, but have no knowlege or time to figure out what went wrong and how to fix it, AND the IT department, which would know how to figure out the problem and fix it, don't want to cooperate with us because they believe it's not their problem.

So we're usually left trying to help the clinician directly by phone call. And it is usually very common that clinicians don't have admin privileges to install applications, so in that case having an external tool would not be helpful because they would have no way to install and run it. That's why the diagnostics need to be done by the software itself.

... Windows 10 and above, I think it should be easy to do App-local deployment as mentioned in the article. I think it is more preferable than static linking.

It could have been nice to know this beforehand... because if we can avoid requiring install vcredist, it could reduce the number of phone call complaints by 20%

But ultimately, the issue is for microsoft to understand that the times of requiring users to install components and drivers is long gone; Users want the smart device app experience. That's why in our case we discarded using the CUDA ORT straight away; it's an ORT that will fail in roughly 75% of end users due to unusable graphics card, wrong or corrupted drivers, mismatching cuda version and who knows what else.

V3nt1n Jan 15, 2025

@vpenades Do you know any solution to explicitly define the path to find the onnxruntime.dll within the Cpp code?
Something like: SetDllDirectoryW(L"D:/Proyectos/onnxruntime-win-x64-1.20.1/lib")
Best regards & thnx for your answer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The recurring problem of ... Unable to load DLL 'onnxruntime' #15309

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 8 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

The recurring problem of ... Unable to load DLL 'onnxruntime' #15309

vpenades Mar 31, 2023

Replies: 8 comments · 3 replies

pranavsharma Apr 1, 2023

vpenades Apr 1, 2023 Author

BengtGustafsson Apr 6, 2023

BengtGustafsson Apr 6, 2023

BengtGustafsson Apr 6, 2023

vpenades Apr 6, 2023 Author

BengtGustafsson Apr 7, 2023

vpenades Apr 11, 2023 Author

snnn Apr 14, 2023 Maintainer

vpenades Apr 18, 2023 Author

V3nt1n Jan 15, 2025

vpenades
Mar 31, 2023

Replies: 8 comments 3 replies

pranavsharma
Apr 1, 2023

vpenades
Apr 1, 2023
Author

BengtGustafsson
Apr 6, 2023

BengtGustafsson
Apr 6, 2023

BengtGustafsson
Apr 6, 2023

vpenades
Apr 6, 2023
Author

BengtGustafsson
Apr 7, 2023

vpenades Apr 11, 2023
Author

snnn
Apr 14, 2023
Maintainer

vpenades Apr 18, 2023
Author