-
-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement tile size scaling for the compute renderer #2065
base: master
Are you sure you want to change the base?
Implement tile size scaling for the compute renderer #2065
Conversation
As far as I can tell, the absurd line count is github going insane over the changes to the shaders. |
It seems more that the files' line endings got converted from LF to CRLF for some reason (text editor/IDE goof?) |
You're right, looks like VSCodium got me. Should be fairly easy to fix. |
00dd600
to
6f8ce9f
Compare
Ok, looks like it's fixed but I'm not sure the line endings for the template I added to Utils.h aren't messed up. Thanks for the tip @CasualPokePlayer. |
Just tested this on my end, and it indeed seems like this PR makes the issue disappear completely. |
it doesnt fix it completely, but very close. see: #2047 (comment) |
Tile scaling alone will never fix the problem completely. At scales 13x and higher it would need tiles larger than 32x32. This isn't feasible because there's a limit of 1024 on the product of the local sizes in compute shaders. Implementations are free to raise this limit, but nVidia (which is the one implementation that has the problem with UB to begin with) doesn't, so we're limited to 32x32 tiles (the tile size is used as the x and y local sizes in at least one shader). As for 4x and 8x, it's what I wrote in the OP. My branch scales on a factor of 4.5, it looks like more aggressive scaling is needed. Specifically, using a factor of 3.25 (scaling at 4x and 7x, instead of 5x and 9x) seems like it should be fine. 13x and higher will still be a problem, so I was unsure if it was even worth changing and I decided to wait for @RSDuck's input. FYI, the worst-case scenario I used for testing was MKDS's Desert Hills fire balls (near the pyramid, during laps 2 and 3). You have to try and get them as close to the camera as possible. |
I found this implementation flawless in Phantom Hourglass and Spirit Tracks, at any resolution from 1x to 16x. A far as I can tell, it completely fixes this issue. What's the holdup on getting it merged? Is this planned to be part of the November v1.0 release? |
I tested this PR on an Intel Arc A380 GPU 6GB. Multiple AMD GPUs, I tried on an Apple M3 but realized compute shader renderer is not supported there, and I tested it a lot on my main Nvidia 3000 series gpu. I even tested it on RC 1.0, didn't seem to cause any problems. |
Hopefully this will be officially merged into main! It fixes all issues I noticed testing on my RTX 3090, currently using 1.0RC with this fix merged into it. |
Based on @RSDuck's comments in #2047.
Right now it scales starting at 4.5x (effectively 5x). Based on some more recent testing, it looks like it should start scaling at 3.25x (effectively 4x). See this gist for more information. It also contains notes on the rationale behind the values for
ClearCoarseBinMaskLocalSize
.The value used for
CoarseTileCountY
when using 32x32 tiles was chosen to match the base resolution's vertical resolution of 192 (6 32x32 tiles vertically per coarse tile means coarse tiles end up being 256x192).