Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How PPO2 value function max-clip reduce variability? #1140

Open
Asuka20 opened this issue Aug 30, 2020 · 2 comments
Open

How PPO2 value function max-clip reduce variability? #1140

Asuka20 opened this issue Aug 30, 2020 · 2 comments

Comments

@Asuka20
Copy link

Asuka20 commented Aug 30, 2020

Hi, why using maximum instead of minimum to clipping value function loss?
Suppose clippinng occurs, when v_pred_old < v_clipped < v_pred < R, or reversely, the loss will be larger than not clipped. Then why would it works to reduce the variability?

@shtse8
Copy link

shtse8 commented Aug 31, 2020

I have the same question and even don't understand why we need to do clipping on value loss.

@viai957
Copy link

viai957 commented Sep 20, 2020

I think you should consider a minimum for the clipper loss function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants