Rename `ext::Blur` to `ext::CentralLimitBoxBlur` and rework ex 25 #115

achalpandeyy · 2021-05-22T10:09:53Z

~~I still gotta fix the mirror wrapping case (which I'll fix soon), apart from that it is ready for review~~
~~Some changes are exactly the same as GPU Radix Sort #75 i.e. addition of nbl_glsl_workgroupBroadcast for floating point numbers~~

…it doesn't compile because there is no nbl_glsl_workgroupBroadcast function yet for floats --that change is coming in the next commit.

…rect code

achalpandeyy · 2022-07-19T03:05:07Z

devshgraphicsprogramming · 2022-07-21T16:19:30Z

include/nbl/builtin/glsl/ext/CentralLimitBoxBlur/parameters.glsl

+#define _NBL_GLSL_EXT_BLUR_GET_PARAMETERS_DECLARED_
+#endif
+
+#ifndef _NBL_GLSL_EXT_BLUR_PARAMETERS_METHODS_DEFINED_


they should have a forward declaration in all honesty

I don't get it, forward declared where?

either you forward declare the getters, like this

uvec3 nbl_glsl_ext_Blur_Parameters_t_getDimensions();

and define later under an ifdef like you currently do

or do like we currently do for "pseudo methods" (stuff that could be struct methods in HLSL or C)

uvec3 nbl_glsl_ext_Blur_Parameters_t_getDimensions(in nbl_glsl_ext_Blur_Parameters_t this) { return this.input_dimensions.xyz; }

devshgraphicsprogramming · 2022-07-21T16:21:19Z

include/nbl/ext/CentralLimitBoxBlur/CBlurPerformer.h

+#ifndef _NBL_EXT_BLUR_C_BLUR_PERFORMER_INCLUDED_
+#define _NBL_EXT_BLUR_C_BLUR_PERFORMER_INCLUDED_
+
+#include "nabla.h"


@AnastaZIuk tell us if this is the proper way to include nabla for an extension

include/nbl/ext/CentralLimitBoxBlur/CBlurPerformer.h

devshgraphicsprogramming · 2022-07-21T16:22:20Z

include/nbl/ext/CentralLimitBoxBlur/CBlurPerformer.h

+struct alignas(16) uvec4
+{
+    uint x, y, z, w;
+};
+#include "nbl/builtin/glsl/ext/CentralLimitBoxBlur/parameters_struct.glsl"


enclose in namespace impl

Can we instead make this private to CBlurPeformer?

Like we did for the GPU blit.

how would that even work if I wanted to access/muate these members from outside of the class?
I mean the type definition is private, how would that work!? (godbolt example to prove its possible please)

P.S. Does GPU blit have a bug then?

devshgraphicsprogramming · 2022-07-21T16:23:11Z

include/nbl/ext/CentralLimitBoxBlur/CBlurPerformer.h

+    static inline void defaultBarrier()
+    {
+        video::COpenGLExtensionHandler::extGlMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
+    }


you need to provide barrier APIs similar to the blit extension

I don't think the blit API has this.
Do we want to take a similar route to CScanner?

yes please, but what does the blit API have for helping with barriers then?

devshgraphicsprogramming · 2022-07-21T16:26:10Z

include/nbl/ext/CentralLimitBoxBlur/CBlurPerformer.h

+    _NBL_STATIC_INLINE_CONSTEXPR uint32_t DEFAULT_WORKGROUP_SIZE = 256u;
+    _NBL_STATIC_INLINE_CONSTEXPR uint32_t PASSES_PER_AXIS = 3u;
+
+    typedef nbl_glsl_ext_Blur_Parameters_t Parameters_t;


use this cool trick

struct Parameters_t : impl::nbl_glsl_ext_Blur_Parameters_t { };

then we can turn buildParameters from a static function to a constructor

If we do this.
Then it can just be:

struct Parameters_t : nbl_glsl_ext_Blur_Parameters_t { };

it could, but I'm a little hazy about a public class inheriting from a private one

devshgraphicsprogramming · 2022-07-21T16:26:23Z

include/nbl/ext/CentralLimitBoxBlur/CBlurPerformer.h

+    static inline uint32_t buildParameters(uint32_t numChannels, const asset::VkExtent3D& inputDimensions, Parameters_t* outParams, DispatchInfo_t* outInfos,
+        const float radius, const asset::ISampler::E_TEXTURE_CLAMP* wrappingType, const asset::ISampler::E_TEXTURE_BORDER_COLOR* borderColor = nullptr)


document what unit the radius is measured in

turn this into a Parameters_t constructor

document what unit the radius is measured in

Sure, but it is measured in NDC space, isn't it?

turn this into a Parameters_t constructor

buildParameters also builds the DispatchInfo_t --add constructor there as well?

Sure, but it is measured in NDC space, isn't it
I don't remember anymore, please deduce, could be normalized tex coord just as well

buildParameters also builds the DispatchInfo_t --add constructor there as well?

We can probably get rid of DispatchInfo_t cause its pointless to keep it around (the way you work out dispatch sizes is really simple),ca just have a dispatchHelper which takes passAxis/passIndex

P.S. we still need a function to tell us how many passes we need.

include/nbl/ext/CentralLimitBoxBlur/CBlurPerformer.h

achalpandeyy · 2022-07-22T04:55:13Z

include/nbl/builtin/glsl/ext/CentralLimitBoxBlur/default_compute_blur.comp

+layout (set = _NBL_GLSL_EXT_BLUR_INPUT_SET_DEFINED_, binding = _NBL_GLSL_EXT_BLUR_INPUT_BINDING_DEFINED_, std430) restrict readonly buffer InputBuffer
+{
+	float in_values[];
+};
+
+#endif
+
+#ifndef _NBL_GLSL_EXT_BLUR_OUTPUT_DESCRIPTOR_DEFINED_
+#define _NBL_GLSL_EXT_BLUR_OUTPUT_DESCRIPTOR_DEFINED_
+
+layout (set = _NBL_GLSL_EXT_BLUR_OUTPUT_SET_DEFINED_, binding = _NBL_GLSL_EXT_BLUR_OUTPUT_BINDING_DEFINED_, std430) restrict writeonly buffer OutputBuffer
+{
+	float out_values[];
+};


Do we really want the default input and output descriptors to be float SSBOs?

well, summed area tables require LOTS of precision, also the only way to use 16bit types would be as STBs (imageBuffer) of R16_UNORM (16bit types in SSBOs require non-ubiquitous extensions).

Realistically speaking, whenever you use a 8 bit format, your LUT would need to be 16 or 32 bit, and whenever you use 16bit your LUT really needs to be 32bit

In fact float isnt that great, a "software" 32bit UNORM would have been better but we would have had to store a float denormalization constant per row.

What we should really be asking ourselves is "should we be giving double or uint64_t unorm as a intermediate storage option" ?

P.S. you can write this down as a comment, but please dont do anything about it now.

achalpandeyy · 2022-07-22T07:02:29Z

src/nbl/ext/CentralLimitBoxBlur/CBlurPerformer.cpp

+CBlurPerformer::CBlurPerformer(video::IVideoDriver* driver, uint32_t maxDimensionSize, bool useHalfStorage)
+    : m_maxBlurLen(maxDimensionSize), m_halfFloatStorage(useHalfStorage)
+{
+    static IGPUDescriptorSetLayout::SBinding bnd[] =
+    {
+        {
+            0u,
+            EDT_STORAGE_BUFFER,
+            1u,
+            ISpecializedShader::ESS_COMPUTE,
+            nullptr
+        },
+        {
+            1u,
+            EDT_STORAGE_BUFFER,
+            1u,
+            ISpecializedShader::ESS_COMPUTE,
+            nullptr
+        },
+    };
+
+    m_dsLayout = driver->createGPUDescriptorSetLayout(bnd, bnd + sizeof(bnd) / sizeof(IGPUDescriptorSetLayout::SBinding));
+
+    auto pcRange = getDefaultPushConstantRanges();
+    m_pplnLayout = driver->createGPUPipelineLayout(pcRange.begin(), pcRange.end(), core::smart_refctd_ptr(m_dsLayout));
+


I think we should create a static create function instead of a constructor?

yes, agreed 100%

…lly pulling out values and, separate out a descriptors.glsl from default_compute_blur.comp.

…mit, and utility for creating specialized shaders.

devshgraphicsprogramming · 2022-07-29T13:43:37Z

be consitent with naming, the macros/defines should have EXT_CENTRAL_LIMIT_BOX_BLUR instead of EXT_BLUR

the nbl_glsl_ext_Blur should be nbl_glsl_ext_CentralLimitBoxBlur as well

devshgraphicsprogramming · 2022-07-29T13:44:02Z

include/nbl/builtin/glsl/ext/CentralLimitBoxBlur/parameters_struct.glsl

+#ifdef __cplusplus
+#define uint uint32_t
+#endif


you don't even use this

devshgraphicsprogramming · 2022-07-29T13:48:23Z

include/nbl/ext/CentralLimitBoxBlur/CBlurPerformer.h

+namespace nbl
+{
+namespace ext
+{
+namespace CentralLimitBoxBlur
+{
+
+struct uvec4
+{
+    uint32_t x, y, z, w;
+};
+#include "nbl/builtin/glsl/ext/CentralLimitBoxBlur/parameters_struct.glsl"
+
+class CBlurPerformer final : public core::IReferenceCounted
+{
+public:
+    _NBL_STATIC_INLINE_CONSTEXPR uint32_t DefaultWorkgroupSize = 256u;
+    _NBL_STATIC_INLINE_CONSTEXPR uint32_t PassesPerAxis = 3u;


few syntax pointers:

use nested namespace, like this namespace nbl::ext::CentralLimitBoxBlur

you can write static inline constexpr now, no need for the _NBL_STATIC_INLINE_CONSTEXPR

devshgraphicsprogramming · 2022-07-29T13:58:34Z

include/nbl/ext/CentralLimitBoxBlur/CBlurPerformer.h

+                // At this point I'm wondering do you even need both input and output strides
+                params.output_strides = params.input_strides;


I think the idea was to be able to transpose the array after every pass so that the stores are coalesced (match invocation ID increase with output memory address increase) just like we do in the Blit

If you can deduce the input strides from output strides somehow, great.

in reality we don't need 8 separate values, we probably only need 4:

YZ, ZX, XY products

XYZ full product

devshgraphicsprogramming · 2022-07-29T13:59:25Z

include/nbl/ext/CentralLimitBoxBlur/CBlurPerformer.h

+    core::smart_refctd_ptr<video::IGPUComputePipeline> m_ppln = nullptr;
+
+    uint32_t m_maxBlurLen;
+    bool m_halfFloatStorage;


unused variable?

devshgraphicsprogramming · 2022-07-29T14:08:04Z

src/nbl/ext/CentralLimitBoxBlur/CBlurPerformer.cpp

+    auto pcRange = getDefaultPushConstantRanges();
+    m_pplnLayout = device->createPipelineLayout(pcRange.begin(), pcRange.end(), core::smart_refctd_ptr(m_dsLayout));
+
+    auto specShader = createSpecializedShader("nbl/builtin/glsl/ext/CentralLimitBoxBlur/default_compute_blur.comp", m_maxBlurLen, useHalfStorage ? 1u : 0u, device);


do you need m_halfFloatStorage afterward?

devshgraphicsprogramming · 2022-07-29T14:09:02Z

src/nbl/ext/CentralLimitBoxBlur/CBlurPerformer.cpp

+core::smart_refctd_ptr<video::IGPUSpecializedShader> CBlurPerformer::createSpecializedShader(const char* shaderIncludePath, const uint32_t axisDim, const bool useHalfStorage, video::ILogicalDevice* device)
+{
+    std::ostringstream shaderSourceStream;
+    shaderSourceStream
+        << "#version 460 core\n"
+        << "#define _NBL_GLSL_WORKGROUP_SIZE_ " << DefaultWorkgroupSize << "\n" // Todo(achal): Get the workgroup size from outside
+        << "#define _NBL_GLSL_EXT_BLUR_PASSES_PER_AXIS_ " << PassesPerAxis << "\n" // Todo(achal): Get this from outside?
+        << "#define _NBL_GLSL_EXT_BLUR_AXIS_DIM_ " << axisDim << "\n"
+        << "#define _NBL_GLSL_EXT_BLUR_HALF_STORAGE_ " << (useHalfStorage ? 1 : 0) << "\n"
+        << "#include \"" << shaderIncludePath << "\"\n";


you know shaderIncludePath, dont let it be a parameter

devshgraphicsprogramming · 2022-07-29T14:10:22Z

include/nbl/builtin/glsl/workgroup/shared_blur.glsl

+#ifndef _NBL_BUILTIN_GLSL_WORKGROUP_SHARED_BLUR_INCLUDED_
+#define _NBL_BUILTIN_GLSL_WORKGROUP_SHARED_BLUR_INCLUDED_
+


this include is in the wrong place and "namespace", should be in ext/CentralLimitBoxBlur

devshgraphicsprogramming · 2022-07-29T14:11:40Z

include/nbl/builtin/glsl/ext/CentralLimitBoxBlur/default_compute_blur.comp

+	const uint strided_idx = nbl_glsl_dot(uvec4(coordinate, channel), nbl_glsl_ext_Blur_Parameters_t_getInputStrides());
+
+	float data = 0.f;
+	if (all(lessThan(coordinate, dims)))
+		data = in_values[strided_idx];
+
+	return data;
+}
+
+#endif
+
+#ifndef _NBL_GLSL_EXT_BLUR_SET_DATA_DEFINED_
+#define _NBL_GLSL_EXT_BLUR_SET_DATA_DEFINED_
+
+void nbl_glsl_ext_Blur_setData(in uvec3 coordinate, in uint channel, in float val)
+{
+	const uint channel_count = nbl_glsl_ext_Blur_Parameters_t_getChannelCount();
+	const uvec3 dims = nbl_glsl_ext_Blur_Parameters_t_getDimensions();
+
+	if (all(lessThan(coordinate, dims)))
+	{
+		const uint strided_idx = nbl_glsl_dot(uvec4(coordinate, channel), nbl_glsl_ext_Blur_Parameters_t_getOutputStrides());


maybe use our new snakeCurve addressing functions instead of nbl_glsl_dot ?

achalpandeyy added 17 commits May 10, 2021 18:36

Single pass is working, should be smooth sailing from here. This comm…

f523961

…it doesn't compile because there is no nbl_glsl_workgroupBroadcast function yet for floats --that change is coming in the next commit.

Add nbl_glsl_workgroupBroadcast for floats

6d4dd3e

Make it (only the scan) work in hacky way by doing SPILLAGE == VT

5ae5171

Handle multiple channels

141a3bd

1D/directional blur working without any downscaling

96542fb

2D blur working

0e62797

taking first steps towards blur extension

57d4e7d

2x2 by downscale and minor cleaning and DRYing

bfe6621

add getCoordinates

65ee5cb

extend to 3D, channel count and direction push constants packed into one

b62cb3f

input/output strides

f6f94a1

add CLAMP_TO_EDGE and REPEAT wrapping modes with much cleaner and cor…

1bec441

…rect code

add CLAMP_TO_BORDER and (broken) MIRROR

1385769

mouse input to control blur radius

2f3c0b0

almost done with CPU side

ee92a29

blur algorithm header and more cleanups

2bcdce7

fix mirror

af69a50

achalpandeyy added 3 commits July 19, 2022 11:55

Remove examples_tests directory.

9067299

Merge master.

ac02165

Merge remote-tracking branch 'upstream/master' into blur

dd9aa23

devshgraphicsprogramming reviewed Jul 21, 2022

View reviewed changes

include/nbl/ext/CentralLimitBoxBlur/CBlurPerformer.h Outdated Show resolved Hide resolved

devshgraphicsprogramming reviewed Jul 21, 2022

View reviewed changes

include/nbl/ext/CentralLimitBoxBlur/CBlurPerformer.h Show resolved Hide resolved

devshgraphicsprogramming reviewed Jul 21, 2022

View reviewed changes

include/nbl/ext/CentralLimitBoxBlur/CBlurPerformer.h Outdated Show resolved Hide resolved

achalpandeyy commented Jul 22, 2022

View reviewed changes

achalpandeyy added 2 commits July 22, 2022 12:42

CBlurPerformer cleanups: use bitfieldExtract in GLSL instead of manua…

b733718

…lly pulling out values and, separate out a descriptors.glsl from default_compute_blur.comp.

CBlurPerformer: Forgot to commit descriptors.glsl in the previous com…

ae53795

…mit, and utility for creating specialized shaders.

devshgraphicsprogramming reviewed Jul 29, 2022

View reviewed changes

AnastaZIuk force-pushed the master branch 3 times, most recently from a07e79f to ffbd843 Compare January 18, 2024 21:24

devshgraphicsprogramming closed this Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename `ext::Blur` to `ext::CentralLimitBoxBlur` and rework ex 25 #115

Rename `ext::Blur` to `ext::CentralLimitBoxBlur` and rework ex 25 #115

achalpandeyy commented May 22, 2021 •

edited

Loading

achalpandeyy commented Jul 19, 2022 •

edited

Loading

devshgraphicsprogramming Jul 21, 2022

achalpandeyy Jul 22, 2022

devshgraphicsprogramming Jul 29, 2022

devshgraphicsprogramming Jul 21, 2022

devshgraphicsprogramming Jul 21, 2022

achalpandeyy Jul 22, 2022

achalpandeyy Jul 22, 2022

devshgraphicsprogramming Jul 29, 2022

devshgraphicsprogramming Jul 21, 2022

achalpandeyy Jul 22, 2022

devshgraphicsprogramming Jul 29, 2022

devshgraphicsprogramming Jul 21, 2022

achalpandeyy Jul 22, 2022

devshgraphicsprogramming Jul 29, 2022

devshgraphicsprogramming Jul 21, 2022

achalpandeyy Jul 22, 2022 •

edited

Loading

devshgraphicsprogramming Jul 29, 2022

achalpandeyy Jul 22, 2022

devshgraphicsprogramming Jul 29, 2022

achalpandeyy Jul 22, 2022

devshgraphicsprogramming Jul 29, 2022

devshgraphicsprogramming commented Jul 29, 2022

devshgraphicsprogramming Jul 29, 2022

devshgraphicsprogramming Jul 29, 2022

devshgraphicsprogramming Jul 29, 2022

devshgraphicsprogramming Jul 29, 2022

devshgraphicsprogramming Jul 29, 2022

devshgraphicsprogramming Jul 29, 2022

devshgraphicsprogramming Jul 29, 2022

devshgraphicsprogramming Jul 29, 2022

		static inline uint32_t buildParameters(uint32_t numChannels, const asset::VkExtent3D& inputDimensions, Parameters_t* outParams, DispatchInfo_t* outInfos,
		const float radius, const asset::ISampler::E_TEXTURE_CLAMP* wrappingType, const asset::ISampler::E_TEXTURE_BORDER_COLOR* borderColor = nullptr)

		// At this point I'm wondering do you even need both input and output strides
		params.output_strides = params.input_strides;

		#ifndef _NBL_BUILTIN_GLSL_WORKGROUP_SHARED_BLUR_INCLUDED_
		#define _NBL_BUILTIN_GLSL_WORKGROUP_SHARED_BLUR_INCLUDED_

Rename ext::Blur to ext::CentralLimitBoxBlur and rework ex 25 #115

Rename ext::Blur to ext::CentralLimitBoxBlur and rework ex 25 #115

Conversation

achalpandeyy commented May 22, 2021 • edited Loading

achalpandeyy commented Jul 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

achalpandeyy Jul 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

devshgraphicsprogramming commented Jul 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rename `ext::Blur` to `ext::CentralLimitBoxBlur` and rework ex 25 #115

Rename `ext::Blur` to `ext::CentralLimitBoxBlur` and rework ex 25 #115

achalpandeyy commented May 22, 2021 •

edited

Loading

achalpandeyy commented Jul 19, 2022 •

edited

Loading

achalpandeyy Jul 22, 2022 •

edited

Loading