You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since the first start of the sentence token is always fixed, I noticed a small improvement when detaching it during the training. I guess this helps in better association between the "V* category" and the target image and thus improved generation on inference time prompt.
if crossattn:
detach = torch.ones_like(key)
detach[:, :1, :] = detach[:, :1, :]0.
key = detachkey + (1-detach)key.detach()
value = detachvalue + (1-detach)*value.detach()
Why stop the gradient of the first key-value pair here?
The text was updated successfully, but these errors were encountered: