Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename ext::Blur to ext::CentralLimitBoxBlur and rework ex 25 #115

Closed

Conversation

achalpandeyy
Copy link
Contributor

@achalpandeyy achalpandeyy commented May 22, 2021

For #61

  • I still gotta fix the mirror wrapping case (which I'll fix soon), apart from that it is ready for review
  • Some changes are exactly the same as GPU Radix Sort #75 i.e. addition of nbl_glsl_workgroupBroadcast for floating point numbers

@achalpandeyy
Copy link
Contributor Author

achalpandeyy commented Jul 19, 2022

Merge to master TODOs:

  • Merge achalpandeyy/blur with the current master and make ex25 compile and work.
    • Get ex25 running on old API.
    • Merge with master.
    • Separate out ex25 source in the dedicated examples_tests repo. PR
    • Port to new API and make everything work.
  • API modifications:
    • Rethink default input and output descriptors.
    • Separate out a descriptors.glsl from default_compute_blur.comp.
    • Default barrier api/dispatchHelper.
    • Utility function for creating shaders.
  • Resolve comments after review.

#define _NBL_GLSL_EXT_BLUR_GET_PARAMETERS_DECLARED_
#endif

#ifndef _NBL_GLSL_EXT_BLUR_PARAMETERS_METHODS_DEFINED_

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they should have a forward declaration in all honesty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it, forward declared where?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either you forward declare the getters, like this

uvec3 nbl_glsl_ext_Blur_Parameters_t_getDimensions();

and define later under an ifdef like you currently do

or do like we currently do for "pseudo methods" (stuff that could be struct methods in HLSL or C)

uvec3 nbl_glsl_ext_Blur_Parameters_t_getDimensions(in nbl_glsl_ext_Blur_Parameters_t this)
{
    return this.input_dimensions.xyz;
}

#ifndef _NBL_EXT_BLUR_C_BLUR_PERFORMER_INCLUDED_
#define _NBL_EXT_BLUR_C_BLUR_PERFORMER_INCLUDED_

#include "nabla.h"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnastaZIuk tell us if this is the proper way to include nabla for an extension

Comment on lines 18 to 22
struct alignas(16) uvec4
{
uint x, y, z, w;
};
#include "nbl/builtin/glsl/ext/CentralLimitBoxBlur/parameters_struct.glsl"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enclose in namespace impl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead make this private to CBlurPeformer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would that even work if I wanted to access/muate these members from outside of the class?
I mean the type definition is private, how would that work!? (godbolt example to prove its possible please)

P.S. Does GPU blit have a bug then?

Comment on lines 41 to 44
static inline void defaultBarrier()
{
video::COpenGLExtensionHandler::extGlMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to provide barrier APIs similar to the blit extension

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the blit API has this.
Do we want to take a similar route to CScanner?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please, but what does the blit API have for helping with barriers then?

_NBL_STATIC_INLINE_CONSTEXPR uint32_t DEFAULT_WORKGROUP_SIZE = 256u;
_NBL_STATIC_INLINE_CONSTEXPR uint32_t PASSES_PER_AXIS = 3u;

typedef nbl_glsl_ext_Blur_Parameters_t Parameters_t;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use this cool trick

struct Parameters_t : impl::nbl_glsl_ext_Blur_Parameters_t 
{
};

then we can turn buildParameters from a static function to a constructor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this.
Then it can just be:

struct Parameters_t : nbl_glsl_ext_Blur_Parameters_t
{
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could, but I'm a little hazy about a public class inheriting from a private one

Comment on lines +51 to +52
static inline uint32_t buildParameters(uint32_t numChannels, const asset::VkExtent3D& inputDimensions, Parameters_t* outParams, DispatchInfo_t* outInfos,
const float radius, const asset::ISampler::E_TEXTURE_CLAMP* wrappingType, const asset::ISampler::E_TEXTURE_BORDER_COLOR* borderColor = nullptr)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document what unit the radius is measured in

turn this into a Parameters_t constructor

Copy link
Contributor Author

@achalpandeyy achalpandeyy Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document what unit the radius is measured in

Sure, but it is measured in NDC space, isn't it?

turn this into a Parameters_t constructor

buildParameters also builds the DispatchInfo_t --add constructor there as well?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but it is measured in NDC space, isn't it
I don't remember anymore, please deduce, could be normalized tex coord just as well

buildParameters also builds the DispatchInfo_t --add constructor there as well?

We can probably get rid of DispatchInfo_t cause its pointless to keep it around (the way you work out dispatch sizes is really simple),ca just have a dispatchHelper which takes passAxis/passIndex

P.S. we still need a function to tell us how many passes we need.

Comment on lines 20 to 33
layout (set = _NBL_GLSL_EXT_BLUR_INPUT_SET_DEFINED_, binding = _NBL_GLSL_EXT_BLUR_INPUT_BINDING_DEFINED_, std430) restrict readonly buffer InputBuffer
{
float in_values[];
};

#endif

#ifndef _NBL_GLSL_EXT_BLUR_OUTPUT_DESCRIPTOR_DEFINED_
#define _NBL_GLSL_EXT_BLUR_OUTPUT_DESCRIPTOR_DEFINED_

layout (set = _NBL_GLSL_EXT_BLUR_OUTPUT_SET_DEFINED_, binding = _NBL_GLSL_EXT_BLUR_OUTPUT_BINDING_DEFINED_, std430) restrict writeonly buffer OutputBuffer
{
float out_values[];
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want the default input and output descriptors to be float SSBOs?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, summed area tables require LOTS of precision, also the only way to use 16bit types would be as STBs (imageBuffer) of R16_UNORM (16bit types in SSBOs require non-ubiquitous extensions).

Realistically speaking, whenever you use a 8 bit format, your LUT would need to be 16 or 32 bit, and whenever you use 16bit your LUT really needs to be 32bit

In fact float isnt that great, a "software" 32bit UNORM would have been better but we would have had to store a float denormalization constant per row.

What we should really be asking ourselves is "should we be giving double or uint64_t unorm as a intermediate storage option" ?

P.S. you can write this down as a comment, but please dont do anything about it now.

Comment on lines 12 to 37
CBlurPerformer::CBlurPerformer(video::IVideoDriver* driver, uint32_t maxDimensionSize, bool useHalfStorage)
: m_maxBlurLen(maxDimensionSize), m_halfFloatStorage(useHalfStorage)
{
static IGPUDescriptorSetLayout::SBinding bnd[] =
{
{
0u,
EDT_STORAGE_BUFFER,
1u,
ISpecializedShader::ESS_COMPUTE,
nullptr
},
{
1u,
EDT_STORAGE_BUFFER,
1u,
ISpecializedShader::ESS_COMPUTE,
nullptr
},
};

m_dsLayout = driver->createGPUDescriptorSetLayout(bnd, bnd + sizeof(bnd) / sizeof(IGPUDescriptorSetLayout::SBinding));

auto pcRange = getDefaultPushConstantRanges();
m_pplnLayout = driver->createGPUPipelineLayout(pcRange.begin(), pcRange.end(), core::smart_refctd_ptr(m_dsLayout));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should create a static create function instead of a constructor?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, agreed 100%

…lly pulling out values and, separate out a descriptors.glsl from default_compute_blur.comp.
…mit, and utility for creating specialized shaders.
@devshgraphicsprogramming
Copy link
Member

be consitent with naming, the macros/defines should have EXT_CENTRAL_LIMIT_BOX_BLUR instead of EXT_BLUR

the nbl_glsl_ext_Blur should be nbl_glsl_ext_CentralLimitBoxBlur as well

Comment on lines +4 to +6
#ifdef __cplusplus
#define uint uint32_t
#endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't even use this

Comment on lines +10 to +27
namespace nbl
{
namespace ext
{
namespace CentralLimitBoxBlur
{

struct uvec4
{
uint32_t x, y, z, w;
};
#include "nbl/builtin/glsl/ext/CentralLimitBoxBlur/parameters_struct.glsl"

class CBlurPerformer final : public core::IReferenceCounted
{
public:
_NBL_STATIC_INLINE_CONSTEXPR uint32_t DefaultWorkgroupSize = 256u;
_NBL_STATIC_INLINE_CONSTEXPR uint32_t PassesPerAxis = 3u;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few syntax pointers:

  • use nested namespace, like this namespace nbl::ext::CentralLimitBoxBlur
  • you can write static inline constexpr now, no need for the _NBL_STATIC_INLINE_CONSTEXPR

Comment on lines +91 to +92
// At this point I'm wondering do you even need both input and output strides
params.output_strides = params.input_strides;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea was to be able to transpose the array after every pass so that the stores are coalesced (match invocation ID increase with output memory address increase) just like we do in the Blit

If you can deduce the input strides from output strides somehow, great.

in reality we don't need 8 separate values, we probably only need 4:

  • YZ, ZX, XY products
  • XYZ full product

core::smart_refctd_ptr<video::IGPUComputePipeline> m_ppln = nullptr;

uint32_t m_maxBlurLen;
bool m_halfFloatStorage;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused variable?

auto pcRange = getDefaultPushConstantRanges();
m_pplnLayout = device->createPipelineLayout(pcRange.begin(), pcRange.end(), core::smart_refctd_ptr(m_dsLayout));

auto specShader = createSpecializedShader("nbl/builtin/glsl/ext/CentralLimitBoxBlur/default_compute_blur.comp", m_maxBlurLen, useHalfStorage ? 1u : 0u, device);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need m_halfFloatStorage afterward?

Comment on lines +56 to +65
core::smart_refctd_ptr<video::IGPUSpecializedShader> CBlurPerformer::createSpecializedShader(const char* shaderIncludePath, const uint32_t axisDim, const bool useHalfStorage, video::ILogicalDevice* device)
{
std::ostringstream shaderSourceStream;
shaderSourceStream
<< "#version 460 core\n"
<< "#define _NBL_GLSL_WORKGROUP_SIZE_ " << DefaultWorkgroupSize << "\n" // Todo(achal): Get the workgroup size from outside
<< "#define _NBL_GLSL_EXT_BLUR_PASSES_PER_AXIS_ " << PassesPerAxis << "\n" // Todo(achal): Get this from outside?
<< "#define _NBL_GLSL_EXT_BLUR_AXIS_DIM_ " << axisDim << "\n"
<< "#define _NBL_GLSL_EXT_BLUR_HALF_STORAGE_ " << (useHalfStorage ? 1 : 0) << "\n"
<< "#include \"" << shaderIncludePath << "\"\n";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you know shaderIncludePath, dont let it be a parameter

Comment on lines +1 to +3
#ifndef _NBL_BUILTIN_GLSL_WORKGROUP_SHARED_BLUR_INCLUDED_
#define _NBL_BUILTIN_GLSL_WORKGROUP_SHARED_BLUR_INCLUDED_

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this include is in the wrong place and "namespace", should be in ext/CentralLimitBoxBlur

Comment on lines +36 to +57
const uint strided_idx = nbl_glsl_dot(uvec4(coordinate, channel), nbl_glsl_ext_Blur_Parameters_t_getInputStrides());

float data = 0.f;
if (all(lessThan(coordinate, dims)))
data = in_values[strided_idx];

return data;
}

#endif

#ifndef _NBL_GLSL_EXT_BLUR_SET_DATA_DEFINED_
#define _NBL_GLSL_EXT_BLUR_SET_DATA_DEFINED_

void nbl_glsl_ext_Blur_setData(in uvec3 coordinate, in uint channel, in float val)
{
const uint channel_count = nbl_glsl_ext_Blur_Parameters_t_getChannelCount();
const uvec3 dims = nbl_glsl_ext_Blur_Parameters_t_getDimensions();

if (all(lessThan(coordinate, dims)))
{
const uint strided_idx = nbl_glsl_dot(uvec4(coordinate, channel), nbl_glsl_ext_Blur_Parameters_t_getOutputStrides());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use our new snakeCurve addressing functions instead of nbl_glsl_dot ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants