You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello. Thanks for the great work. I am trying to understand some results in the paper. Specifically Table 4 and 6. Can you clarify it for me? Thank you!
Table 4:
In table 4 they have the same throughput: meaning they can process same number of images in one second. So, what I understand here is for example for DDIM-59 steps, deepcache is using less full inference and plus shallow inference, but still has the same inference time in total. For example, if N_skip = 2, DeepCache might only perform 40 full steps and use shallow network inference (partial inference) for other intermediate 40 steps. And with this setup they have the same image quality according to the Table.
If that is so, what is the point of using deepcache with this setup? I would expect a comparison like with full inference DDIM 50 steps, we use deepcache with N_skip = 2, and so ~2x faster inference but image quality drops about this amount in FID.
Table 6:
Here is the part explains Table 6 from the paper: "Results presented in Table 6 indicate that, with the additional computation of the shallow U-Net, DeepCache improves the 50-step DDIM by 0.32 and the 10-step DDIM by 2.98."
I am trying to understand this, specifically, you mention:
"Steps here mean the number of steps that perform full model inference."
Does this mean that both DDIM and DeepCache perform the same number of full U-Net inference steps, but DeepCache adds shallow network inference on top? If so, is that why you don't compare the inference times in this Table because with this case DeepCache would be slower?
Maybe my questions are silly, sorry for that. I am just trying to understand the tradeoff between the image quality and inference speed for stable diffusion models. For example in the diffusers library deepcache is implemented. And their example script just uses it with N_skip = 2, with DDIM as default scheduler and we see an improved speed. But it is not clear for me how much the image quality drops when we use your method with stable diffusion models.
The text was updated successfully, but these errors were encountered:
For Table 4, the author wants to express that DeepCache can achieve almost the same performance as ddim. As for why the FID increase of DeepCache is not shown under the same synchronization, it may be because it is already shown above.
The quality improvement brought by DeepCache in Table 6 is because the number of DeepCache steps is doubled, and only shallow features are calculated at new steps. As mentioned in the previous paragraph, the deterioration of FID under the same number of steps has been mentioned in Tables 1, 2, and 3.
Hello. Thanks for the great work. I am trying to understand some results in the paper. Specifically Table 4 and 6. Can you clarify it for me? Thank you!
Table 4:
In table 4 they have the same throughput: meaning they can process same number of images in one second. So, what I understand here is for example for DDIM-59 steps, deepcache is using less full inference and plus shallow inference, but still has the same inference time in total. For example, if N_skip = 2, DeepCache might only perform 40 full steps and use shallow network inference (partial inference) for other intermediate 40 steps. And with this setup they have the same image quality according to the Table.
If that is so, what is the point of using deepcache with this setup? I would expect a comparison like with full inference DDIM 50 steps, we use deepcache with N_skip = 2, and so ~2x faster inference but image quality drops about this amount in FID.
Table 6:
Here is the part explains Table 6 from the paper: "Results presented in Table 6 indicate that, with the additional computation of the shallow U-Net, DeepCache improves the 50-step DDIM by 0.32 and the 10-step DDIM by 2.98."
I am trying to understand this, specifically, you mention:
"Steps here mean the number of steps that perform full model inference."
Does this mean that both DDIM and DeepCache perform the same number of full U-Net inference steps, but DeepCache adds shallow network inference on top? If so, is that why you don't compare the inference times in this Table because with this case DeepCache would be slower?
Maybe my questions are silly, sorry for that. I am just trying to understand the tradeoff between the image quality and inference speed for stable diffusion models. For example in the diffusers library deepcache is implemented. And their example script just uses it with N_skip = 2, with DDIM as default scheduler and we see an improved speed. But it is not clear for me how much the image quality drops when we use your method with stable diffusion models.
The text was updated successfully, but these errors were encountered: