Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda runtime error #18

Open
capuzz opened this issue Dec 11, 2017 · 0 comments
Open

Cuda runtime error #18

capuzz opened this issue Dec 11, 2017 · 0 comments

Comments

@capuzz
Copy link

capuzz commented Dec 11, 2017

Hi,
thanks for the reply in the previous issue, but now I have another question.
I'm trying alexnet and the training seems to be working but when I should save the net (in SequentialTrainer.lua) in this way:

network:save(net_path,self._roi_means,self._roi_stds)

I have this error:

THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-T1qml2/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
/home/torch/install/bin/luajit: /home/torch/install/share/lua/5.1/torch/File.lua:351: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-T1qml2/cutorch/lib/THC/generic/THCStorage.cu:66
stack traceback:
[C]: in function 'read'
/home/torch/install/share/lua/5.1/torch/File.lua:351: in function </home/torch/install/share/lua/5.1/torch/File.lua:245>
[C]: in function 'read'
/home/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
/home/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/home/torch/install/share/lua/5.1/nn/Module.lua:193: in function 'read'
/home/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
/home/torch/install/share/lua/5.1/nn/Module.lua:141: in function 'clone'
./network/Net.lua:125: in function 'save'
./train/SequentialTrainer.lua:150: in function '_trainBatch'
./train/SequentialTrainer.lua:97: in function 'train'
./network/NetworkWrapper.lua:40: in function 'trainNetwork'
main_train.lua:48: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50

The problem seem to be in this line in Net.lua:

tmp_regressor = self.regressor:clone()

but I don't know why.
I tried to reduce the size of the dataset thinking it was a GPU problem but the error persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant