PPO2 clip value loss #765

yuhsh24 · 2018-12-17T08:59:24Z

If Clip the value to reduce variability during Critic training, why using tf.maximum(tf.square(vpred - R), tf.square(vpredclipped - R)) instead of using tf.minimum. Please tell me the reasons.

NoelFeiyang · 2018-12-18T07:10:22Z

I have the same question.
Also, the clip range tf.clip(vpred-OLDVPRED, -CLIPRANGE, CLIPRANGE) is weird. Probably it should be tf.clip(vpred-OLDVPRED, -tf.abs(OLDVPRED)*CLIPRANGE, tf.abs(OLDVPRED)*CLIPRANGE)??

rallen10 · 2019-01-15T16:47:27Z

Does anyone have further insight on either/both of the questions posed?

brett-daley · 2019-03-10T11:36:54Z

Issue #445 has some discussion on this

shtse8 · 2020-08-27T07:00:52Z

I have the same question too. Searched for an answer but no luck. Please tell me the reason why it uses maximum instead of minimum.

shtse8 · 2020-08-27T07:48:15Z

Ok. I got the answer now.
Because baseline calculate pgloss1 and pgloss2 with negative sign. So that it uses maximum afterward.

rallen10 mentioned this issue Dec 18, 2018

PPO2 Why combine loss function when parameters not shared between policy and value? #766

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO2 clip value loss #765

PPO2 clip value loss #765

yuhsh24 commented Dec 17, 2018

NoelFeiyang commented Dec 18, 2018

rallen10 commented Jan 15, 2019

brett-daley commented Mar 10, 2019

shtse8 commented Aug 27, 2020

shtse8 commented Aug 27, 2020

PPO2 clip value loss #765

PPO2 clip value loss #765

Comments

yuhsh24 commented Dec 17, 2018

NoelFeiyang commented Dec 18, 2018

rallen10 commented Jan 15, 2019

brett-daley commented Mar 10, 2019

shtse8 commented Aug 27, 2020

shtse8 commented Aug 27, 2020