Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't work with the current master branch of caffe #1

Open
watts4speed opened this issue Dec 15, 2015 · 12 comments
Open

Doesn't work with the current master branch of caffe #1

watts4speed opened this issue Dec 15, 2015 · 12 comments

Comments

@watts4speed
Copy link
Owner

The network blows up and doesn't train when using the current head of caffe/master. Any help or insight into what's going on is appreciated.

@tpbarron
Copy link

When you say the network blows up do you mean that the gradients explode? I can't access my computer with a GPU right now so I can only do CPU training. But running it a few times using the current master branch and your suggested commit, I don't see anything obvious that makes me think it's not training properly. The error plot slowly trends up.

@watts4speed
Copy link
Owner Author

Hi Trevor,

Ya I'm seeing the gradients explode. Maybe it's something with my setup.
You should see things blow up pretty quickly, even with the CPU training.
If you get out about 50 episodes or more you should see the plot be clearly
trending upward. Thanks for checking this out by the way.

Are you using the current head of caffe/master?

On Sat, Dec 19, 2015 at 8:46 AM, Trevor Barron [email protected]
wrote:

When you say the network blows up do you mean that the gradients explode?
I can't access my computer with a GPU right now so I can only do CPU
training. But running it a few times using the current master branch and
your suggested commit, I don't see anything obvious that makes me think
it's not training properly. The error plot slowly trends up.


Reply to this email directly or view it on GitHub
#1 (comment)
.

@tpbarron
Copy link

Yeah, I'm fully up to date with caffe/master. If it helps I'm on Ubuntu 14.04 using openblas. I will check the training again later just to make sure.

@watts4speed
Copy link
Owner Author

Cool! This is very helpful. Maybe it's the nvidia stuff. That would be
really helpful to know. I've heard there are issues with Caffe and CU
toolkit 7.5 or CUDNN but I don't know the details.

On Sun, Dec 20, 2015 at 12:01 PM, Trevor Barron [email protected]
wrote:

Yeah, I'm fully up to date with caffe/master. If it helps I'm on Ubuntu
14.04 using openblas. I will check the training again later just to make
sure.


Reply to this email directly or view it on GitHub
#1 (comment)
.

@tpbarron
Copy link

I'll have to try this on the GPU when I get home later this week. I tried interfacing with a minecraft game and I do get exploding gradients on the head of caffe/master but not with the commit from September. It's possible I introduced a bug in the interface but it's likely something else is still off. I'm not sure why it would give me problems with this setup but not before.

@watts4speed
Copy link
Owner Author

Hi Trevor,

Ya your hitting the problem. I see the same thing. Something seems to
have changed in Caffe. I haven't been able to figure it out yet. You can
play with the LR or clipping that loss, but in the end I think there's a
bug or some incompatibility with caffe. I haven't had time to do any
controlled experiments such as feed known inputs and see of something is
obviously wrong.

On Wed, Dec 23, 2015 at 10:13 AM, Trevor Barron [email protected]
wrote:

I'll have to try this on the GPU when I get home later this week. I tried
interfacing with a minecraft game and I do get exploding gradients on the
head of caffe/master but not with the commit from September. It's possible
I introduced a bug in the interface but it's likely something else is still
off. I'm not sure why it would give me problems with this setup but not
before.


Reply to this email directly or view it on GitHub
#1 (comment)
.

@joyousrabbit
Copy link

Hello watts4, great work. Is the issue resolved?

@watts4speed
Copy link
Owner Author

watts4speed commented Jan 1, 2017 via email

@joyousrabbit
Copy link

joyousrabbit commented Jan 1, 2017

@watts4speed You are right, it's learning rate. After changing base_lr from 0.1 to 0.01 without clip_gradient, it works fine.
The reason is, I guess, some commit in caffe optimized the performance (the newer version should have a higher default learning rate). So the old base_lr 0.1 is too big for the new caffe release.

For game breakout, it gives score 45 in training, however, in evaluation, it reachs only 3 (it stays at left or right, almost never move) why?

@watts4speed
Copy link
Owner Author

watts4speed commented Jan 1, 2017 via email

@chshong
Copy link

chshong commented May 27, 2017

Can this be due to using solver prototxt parameter 'solver_type' which is deprecated in newer versions of Caffe? It seems setting solver_type to ADADELTA has no effect and default SGD is used as the solver.

@watts4speed
Copy link
Owner Author

watts4speed commented May 28, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants