Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrated resampling into samplers #24

Open
azuisleet opened this issue May 31, 2024 · 1 comment
Open

Integrated resampling into samplers #24

azuisleet opened this issue May 31, 2024 · 1 comment

Comments

@azuisleet
Copy link

azuisleet commented May 31, 2024

The current implementation of the dy and smea methods only applies once or twice at an early step. As-is, these resampling methods produce a lot of noise that the sampler can't denoise, if at all.

Additionally, all the proposed techniques here are applied essentially twice at each sampling step (on the steps that they are applied). While it doesn't collapse the space all the time, it produces artifacts like blur, softening, detail loss, or unusual detail, etc. The current implementation boils down to
x = x + d * dt + d * dt
which doesn't seem sound.

Instead, the techniques should be integrated into the sampler at each step, and treating the noise from resampling (up for smea and down for dy) as a kind of ancestral noise.

I don't know the math well enough to say if the techniques themselves are valid, or if application of the techniques is valid, but the effect is interesting.

xyz_grid-0027-56745646

  • The euler dy image here doesn't have any significant issues besides the self-portrait, but it often ruins the style of generations or pulls out detail that would have existed.

I have integrated the smea method into euler and euler a. It uses the noise produced by the resampling process. A vector is chosen between the denoised and resampled vectors.

However, the chosen rescale factor of 1.25 produces too much noise for subsequent de-noising steps. I only increase the dimensions by 2-4 units at maximum. Any more noise and artifacts are produced along the grid of the matrix, which typically turn into confetti if you're lucky.

I tried integrating the dy method, but I don't understand the operations used enough to make it work. I believe it samples at 1/2 size, but the process of resampling creates so much noise that it collapses very quickly.

edit:
I've forked this repository so this can be evaluated independently of the experiments in this repository.

https://github.com/azuisleet/k-diffusion-rsm-sampler

@Koishi-Star
Copy link
Owner

I am not a professional, so please forgive me for using language that may not adhere strictly to scientific standards.

1. Inspiration for Euler dy from Global Attention
The inspiration for Euler dy comes from global attention. Is there a way to enable AI to relate two features that are far apart? I take a small block (1,4,1,1) from each 1x4x2x2 region in the latent space to represent this small area, and then merge these small blocks into a new latent space for denoising. The assumption here is that the AI still works at a smaller scale. Afterwards, I put these blocks back into the original latent space. This way, the entire image contains features derived from a complete image. Even though two features are far apart, due to the existence of the dy step, they still contain a connection, which in a sense enhances global attention.

2. Why Does the Ancestral Algorithm Not Work Well for dy?
For dy, the ancestral algorithm adds random noise at each step, which gradually cancels out the connections during the noise addition process. For smea dy, the crude scaling distorts the noise levels in the ancestral method, leading to poor results.

3. Why Not Apply It at Every Step?
According to practical tests, applying the method at every step does not significantly improve the results compared to applying it at just two steps. Additionally, it consumes extra computational power, so I decided against doing it this way. And about euler smea dy, its core idea is to reduce the large-scale features to a smaller scale using representative values, but I haven't thought of a good way to accomplish this.

4. Regarding x=x+ddt+ddt
This is not a big issue as long as dt remains consistent (or similar). It is akin to performing image-to-image transformation. This pertains to the scheduler part. If interested, you can try constructing a wavy scheduler with a small segment of a straight line (ensure it is monotonically decreasing and positive). You will see the correctness of this explanation in real-time previews.

P.S.I've tried about 2 weeks for euler dy ancestral ,and found I can't solve it without great theoretical foundation.So I'm trying to learn Stable Diffusion professionally now, and try to design better samplers. Overall, thank you for using and liking it : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants