Questions about the tensor shapes #1

typoverflow · 2023-01-12T11:35:02Z

Hi,
thanks for sharing the code! I have been trying to migrate the code to pytorch, but after some reading some questions come up to me:

The advantage is normalized with batch mean and std. I'm wondering whether subtracting batch mean here will cause trouble to the estimation of over-estimation since this may change the symbol of the advantage in that batch.

rambo/rambo/algorithms/rambo.py

Line 966 in 0f46d46

advantages = tf.stop_gradient((advantages - tf.reduce_mean(advantages)) / tf.math.reduce_std(advantages))
The shape of the loss is somehow weird to me. If I get it right, the supervise loss supervised_loss = self._model.train_loss is summed over dim=0(the dimension of ensembles) and averaged over dim=1,2, but the adversarial loss is summed over the batch dimension since in Line 967 both the advantages and log_prob has the shape [256, ]. This is strange to me because we normally don't sum over the batch dimension.

Please correct me if there is any misunderstanding.

The text was updated successfully, but these errors were encountered:

marc-rigter · 2023-01-13T05:17:33Z

Hey,

Thanks for your interest in this project, it would be fantastic if there is an implementation of this in pytorch!

Subtracting the mean is common practice in advantage normalisation. You can find this for example, in Open AI Baselines (Understanding normalization of advantage function openai/baselines#544). I believe the justification for this is that it essentially the same as subtracting a baseline from the policy gradient loss. So long as the baseline does not depend on the action, this does not bias the gradient in expectation (see for example, the "Reinforce with Baseline" Section in the Sutton and Barto RL textbook.

I'm not sure, if in practice, this subtraction makes any difference: I don't think I tested without it as I just implemented the normalisation the same as the Open AI Baselines.

Thanks for pointing this out. You're right, I have implemented this in a strange way, which I did not realise before you pointed it out. From what I can tell, it seems that the "total_loss" that is fed to the optimizer is of shape [256,] corresponding to the loss for each sample in the batch, rather than the mean across the batch dimension. I should have used reduce_mean to make this a scalar.

It's unclear to me from looking at the documentation how Tensorflow handles this vector/array of losses: whether the loss is reduced to a scalar by taking a sum or a mean. Do you know if it is definitely a sum? If it is the sum, then I guess the implication is that the scale of the gradients will depend on the batch size, so that if the batch size is changed, this might require a change in the learning rate to account for this. This is certainly not ideal as it will make things more difficult to tune if the batch size is modified. However, I won't modify the code in this repo for now as I don't think this is a major issue, and I want to leave it the same as the code that generated the results in the paper.

I hope this makes sense, and please let me know if I have gotten something wrong! Thanks again for your interest in this work :)

typoverflow · 2023-01-13T06:08:01Z

Do you know if it is definitely a sum?

I'm not so familiar with tf, so I just tried with the following snippet:

>>> with tf.GradientTape() as g:
...   x = tf.constant([1,2,3], dtype=tf.float32)
...   g.watch(x)
...   y1 = tf.reduce_sum(x)
...   y2 = tf.reduce_sum(x)
...   y3 = [y1, y2]
...
>>> grads = g.gradient(y3, x)
>>> tf.Session().run(grads)
array([2., 2., 2.], dtype=float32)

The return gradient indicates sum over the batch dimension.

If it is the sum, then I guess the implication is that the scale of the gradients will depend on the batch size, so that if the batch size is changed, this might require a change in the learning rate to account for this.

Actually both a change in the learning rate and the weighting term for adversarial loss versus supervised learning loss are required. But as long as the batch size is kept the same (256) to what produces the numbers in the paper, I don't think this is a big problem.

marc-rigter · 2023-01-13T06:13:23Z

Ah cool, that makes sense. Thanks a lot for pointing this out and investigating this. Sorry for the slight error!

typoverflow mentioned this issue Jan 13, 2023

Implement RAMBO yihaosun1124/OfflineRL-Kit#1

Merged

marc-rigter closed this as completed Jan 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the tensor shapes #1

Questions about the tensor shapes #1

typoverflow commented Jan 12, 2023 •

edited

Loading

marc-rigter commented Jan 13, 2023

typoverflow commented Jan 13, 2023

marc-rigter commented Jan 13, 2023

Questions about the tensor shapes #1

Questions about the tensor shapes #1

Comments

typoverflow commented Jan 12, 2023 • edited Loading

marc-rigter commented Jan 13, 2023

typoverflow commented Jan 13, 2023

marc-rigter commented Jan 13, 2023

typoverflow commented Jan 12, 2023 •

edited

Loading