-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Troubleshoot CNTK
This page collects some of the most frequent pitfalls users encounter.
Although the model was trained with a larger set than the evaluation set, CNTK runs out of memory during evaluation.
Training the model usually has a minibatchSize
property set in its CNTK configuration file. When evaluating the model using CNTK.exe, ensure that the minibatchSize
is appropriate. To quickly determine if this property is causing the issue, set the property to a low value (e.g. minibatchSize=2
) in the configuration file for the evaluation command. (cf. Issue #468)
During eval the following error is seen: About to throw exception 'cuDNN failure 8: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=haha; expr=err'
Set the minibatchSize property to a low value (e.g. minibatchSize=2
).
You must upgrade VS2013 to "update 5". Setting up CNTK on Windows
I enabled Image Reader with zip support and get "Plugin not found: 'ImageReader.dll'" error when running Image Reader unit tests or trying to use the reader. What might be wrong?
Check that you have correctly installed zlib and libzip, especially that you have not forgotten to rename zlib.dll
to zlib1.dll
.
There is a bug in ACML with regards to some Intel chips.
We removed ACML from the CNTK build and will not support it any further. If you are still using an older version of CNTK which is using ACML, we strongly recommend switching to a CNTK version linked against MKL.
If you want to continue using ACML, add ACML_FMA=0
to the system's environment variables to remove this issue. (cf. Issues #465, #506, #519). If the issue still remains you can also try different learning rates (try 0.1, 0.01, 0.001, ... for example), in some cases smaller learning rates are the key.
I have just downloaded and installed CNTK binary package and want to run a job, but get weird errors, like missing CUDA 7.0 libraries (and I have downloaded CPU-only version!)
Please, check carefully what you have in your PATH
. Especially, if it is a shared development machine. With high probability such errors are caused by a "forgotten" CNTK.exe file from a previous release or built from not very recent sources, that is reachable in the PATH
.
On Widows I installed a new version of NVIDIA driver and now CNTK build fails with the errors like ..\Common\BestGpu.cpp(24): fatal error C1083: Cannot open include file: 'nvml.h': No such file or directory
You have selected Perform a clean installation option in NVIDIA Driver Installer. That results in the removal of GPU Deployment Kit (GDK). To repair the system, perform the following steps:
First please make sure that all CNTK dependency dlls are deployed to the Azure web app.
Then you have to set your Azure web app to use 64-bit VM. In order to allow the Azure web app to load CNTK unmanaged dlls, you should change the PATH variable by adding the following code in the Application_Start()
method in global.asax
:
string pathValue = Environment.GetEnvironmentVariable("PATH");
string domainBaseDir = AppDomain.CurrentDomain.BaseDirectory;
string cntkPath = domainBaseDir + @"bin\";
pathValue += ";" + cntkPath;
Environment.SetEnvironmentVariable("PATH", pathValue);
Please see the "Evaluate a model in an Azure WebApi" page for detailed steps.
- Launch CUDA Installer
- Select Custom (Advanced) Installation
- Unselect all installation options, except GPU Deployment Kit
- This will automatically select Graphics Driver option - it is expected. Leave it selected
- Proceed with CUDA installation
- After successful CUDA installation launch the installation of the desired Graphics Driver version
- Select Custom (Advanced) Installation
- Ensure that Perform a clean installation is NOT selected and proceed with the installation
I'm getting one of the following exceptions: "OS call failed or operation not supported on this OS" or "EXCEPTION occurred: CUSPARSE failure 1".
One possible reason here is the excessive memory pressure caused by loading the whole data set in memory with the default (i.e., unlimited) randomization window. Please try running your workload with an explicit randomizationWindow
value, which will limit the amount of input data cached in memory. To do that, add the following parameters to your reader
configuration section (using 10000
as an example, you may choose any value that fits in memory and ensures good randomization):
randomize=true
randomizationWindow=10000 #(assuming that 10K samples << total available memory)
I get errors when using Eval C# library EvalWrapper.dll in Azure web app like the following: "Could not load file or assembly 'some CNTK dlls', or an exception System.Runtime.InteropServices.SEHException, or "InternalServiceFault: External component has thrown an exception.".
First please make sure that all CNTK dependency dlls are deployed to the Azure web app.
Then you have to set your Azure web app to use 64-bit VM. In order to allow the Azure web app to load CNTK unmanaged dlls, you should change the PATH variable by adding the following code in the Application_Start()
method in global.asax
:
string pathValue = Environment.GetEnvironmentVariable("PATH");
string domainBaseDir = AppDomain.CurrentDomain.BaseDirectory;
string cntkPath = domainBaseDir + @"bin\";
pathValue += ";" + cntkPath;
Environment.SetEnvironmentVariable("PATH", pathValue);
Please see the "Evaluate a model in an Azure WebApi" page for detailed steps.