Doesn't work with the current master branch of caffe #1

watts4speed · 2015-12-15T02:25:33Z

The network blows up and doesn't train when using the current head of caffe/master. Any help or insight into what's going on is appreciated.

tpbarron · 2015-12-19T16:46:31Z

When you say the network blows up do you mean that the gradients explode? I can't access my computer with a GPU right now so I can only do CPU training. But running it a few times using the current master branch and your suggested commit, I don't see anything obvious that makes me think it's not training properly. The error plot slowly trends up.

watts4speed · 2015-12-20T00:19:49Z

Hi Trevor,

Ya I'm seeing the gradients explode. Maybe it's something with my setup.
You should see things blow up pretty quickly, even with the CPU training.
If you get out about 50 episodes or more you should see the plot be clearly
trending upward. Thanks for checking this out by the way.

Are you using the current head of caffe/master?

On Sat, Dec 19, 2015 at 8:46 AM, Trevor Barron [email protected]
wrote:

When you say the network blows up do you mean that the gradients explode?
I can't access my computer with a GPU right now so I can only do CPU
training. But running it a few times using the current master branch and
your suggested commit, I don't see anything obvious that makes me think
it's not training properly. The error plot slowly trends up.

—
Reply to this email directly or view it on GitHub
#1 (comment)
.

tpbarron · 2015-12-20T20:01:16Z

Yeah, I'm fully up to date with caffe/master. If it helps I'm on Ubuntu 14.04 using openblas. I will check the training again later just to make sure.

watts4speed · 2015-12-21T00:26:19Z

Cool! This is very helpful. Maybe it's the nvidia stuff. That would be
really helpful to know. I've heard there are issues with Caffe and CU
toolkit 7.5 or CUDNN but I don't know the details.

On Sun, Dec 20, 2015 at 12:01 PM, Trevor Barron [email protected]
wrote:

Yeah, I'm fully up to date with caffe/master. If it helps I'm on Ubuntu
14.04 using openblas. I will check the training again later just to make
sure.

—
Reply to this email directly or view it on GitHub
#1 (comment)
.

tpbarron · 2015-12-23T18:13:52Z

I'll have to try this on the GPU when I get home later this week. I tried interfacing with a minecraft game and I do get exploding gradients on the head of caffe/master but not with the commit from September. It's possible I introduced a bug in the interface but it's likely something else is still off. I'm not sure why it would give me problems with this setup but not before.

watts4speed · 2015-12-26T21:12:12Z

Hi Trevor,

Ya your hitting the problem. I see the same thing. Something seems to
have changed in Caffe. I haven't been able to figure it out yet. You can
play with the LR or clipping that loss, but in the end I think there's a
bug or some incompatibility with caffe. I haven't had time to do any
controlled experiments such as feed known inputs and see of something is
obviously wrong.

On Wed, Dec 23, 2015 at 10:13 AM, Trevor Barron [email protected]
wrote:

I'll have to try this on the GPU when I get home later this week. I tried
interfacing with a minecraft game and I do get exploding gradients on the
head of caffe/master but not with the commit from September. It's possible
I introduced a bug in the interface but it's likely something else is still
off. I'm not sure why it would give me problems with this setup but not
before.

—
Reply to this email directly or view it on GitHub
#1 (comment)
.

joyousrabbit · 2016-12-31T20:39:44Z

Hello watts4, great work. Is the issue resolved?

watts4speed · 2017-01-01T00:33:26Z

Hi Author, I haven't tried things for a year now. There's no reason things shouldn't be working with the latest caffe. My guess is the issue is something to do with the learning rates. W

…

On Sat, Dec 31, 2016 at 12:39 PM, joyousrabbit ***@***.***> wrote: Hello watts4, great work. Is the issue resolved? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADg-pRUZmlF9tRmrL-iQwFdMyZmd_79pks5rNr2QgaJpZM4G1TOR> .

joyousrabbit · 2017-01-01T05:19:48Z

@watts4speed You are right, it's learning rate. After changing base_lr from 0.1 to 0.01 without clip_gradient, it works fine.
The reason is, I guess, some commit in caffe optimized the performance (the newer version should have a higher default learning rate). So the old base_lr 0.1 is too big for the new caffe release.

For game breakout, it gives score 45 in training, however, in evaluation, it reachs only 3 (it stays at left or right, almost never move) why?

watts4speed · 2017-01-01T07:05:19Z

Thanks for checking this out!

…

On Sat, Dec 31, 2016 at 9:19 PM, joyousrabbit ***@***.***> wrote: @watts4speed <https://github.com/watts4speed> You are right, it's learning rate. After changing base_lr from 0.1 to 0.01 without clip_gradient, it works fine. The reason is, I guess, some commit in caffe fixed or optimized the performance (the newer version should have a higher default learning rate). So the old base_lr 0.1 is too big for the new caffe release. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADg-pR4QJCeCyZVSczTKqSfYO-7J6JJlks5rNzd1gaJpZM4G1TOR> .

chshong · 2017-05-27T03:35:13Z

Can this be due to using solver prototxt parameter 'solver_type' which is deprecated in newer versions of Caffe? It seems setting solver_type to ADADELTA has no effect and default SGD is used as the solver.

watts4speed · 2017-05-28T04:31:09Z

I'm not sure what the issue was. See joyousrabbit suggestion about changing the LR. It's been a while since I worked on this so I'm not up to date on things well enough to say.

…

On Fri, May 26, 2017 at 8:35 PM, chshong ***@***.***> wrote: Can this be due to using solver prototxt parameter 'solver_type' which is deprecated in newer versions of Caffe? It seems setting solver_type to ADADELTA has no effect and default SGD is used as the solver. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADg-pVim-KTW0fv4LIq102BtwLaceiKSks5r95nxgaJpZM4G1TOR> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doesn't work with the current master branch of caffe #1

Doesn't work with the current master branch of caffe #1

watts4speed commented Dec 15, 2015

tpbarron commented Dec 19, 2015

watts4speed commented Dec 20, 2015

tpbarron commented Dec 20, 2015

watts4speed commented Dec 21, 2015

tpbarron commented Dec 23, 2015

watts4speed commented Dec 26, 2015

joyousrabbit commented Dec 31, 2016

watts4speed commented Jan 1, 2017 via email

joyousrabbit commented Jan 1, 2017 •

edited

Loading

watts4speed commented Jan 1, 2017 via email

chshong commented May 27, 2017

watts4speed commented May 28, 2017 via email

Doesn't work with the current master branch of caffe #1

Doesn't work with the current master branch of caffe #1

Comments

watts4speed commented Dec 15, 2015

tpbarron commented Dec 19, 2015

watts4speed commented Dec 20, 2015

tpbarron commented Dec 20, 2015

watts4speed commented Dec 21, 2015

tpbarron commented Dec 23, 2015

watts4speed commented Dec 26, 2015

joyousrabbit commented Dec 31, 2016

watts4speed commented Jan 1, 2017 via email

joyousrabbit commented Jan 1, 2017 • edited Loading

watts4speed commented Jan 1, 2017 via email

chshong commented May 27, 2017

watts4speed commented May 28, 2017 via email

joyousrabbit commented Jan 1, 2017 •

edited

Loading