-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename ext::Blur
to ext::CentralLimitBoxBlur
and rework ex 25
#115
Rename ext::Blur
to ext::CentralLimitBoxBlur
and rework ex 25
#115
Conversation
…it doesn't compile because there is no nbl_glsl_workgroupBroadcast function yet for floats --that change is coming in the next commit.
Merge to
|
#define _NBL_GLSL_EXT_BLUR_GET_PARAMETERS_DECLARED_ | ||
#endif | ||
|
||
#ifndef _NBL_GLSL_EXT_BLUR_PARAMETERS_METHODS_DEFINED_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they should have a forward declaration in all honesty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it, forward declared where?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either you forward declare the getters, like this
uvec3 nbl_glsl_ext_Blur_Parameters_t_getDimensions();
and define later under an ifdef like you currently do
or do like we currently do for "pseudo methods" (stuff that could be struct methods in HLSL or C)
uvec3 nbl_glsl_ext_Blur_Parameters_t_getDimensions(in nbl_glsl_ext_Blur_Parameters_t this)
{
return this.input_dimensions.xyz;
}
#ifndef _NBL_EXT_BLUR_C_BLUR_PERFORMER_INCLUDED_ | ||
#define _NBL_EXT_BLUR_C_BLUR_PERFORMER_INCLUDED_ | ||
|
||
#include "nabla.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AnastaZIuk tell us if this is the proper way to include nabla for an extension
struct alignas(16) uvec4 | ||
{ | ||
uint x, y, z, w; | ||
}; | ||
#include "nbl/builtin/glsl/ext/CentralLimitBoxBlur/parameters_struct.glsl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enclose in namespace impl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead make this private
to CBlurPeformer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would that even work if I wanted to access/muate these members from outside of the class?
I mean the type definition is private, how would that work!? (godbolt example to prove its possible please)
P.S. Does GPU blit have a bug then?
static inline void defaultBarrier() | ||
{ | ||
video::COpenGLExtensionHandler::extGlMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to provide barrier APIs similar to the blit extension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the blit API has this.
Do we want to take a similar route to CScanner
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes please, but what does the blit API have for helping with barriers then?
_NBL_STATIC_INLINE_CONSTEXPR uint32_t DEFAULT_WORKGROUP_SIZE = 256u; | ||
_NBL_STATIC_INLINE_CONSTEXPR uint32_t PASSES_PER_AXIS = 3u; | ||
|
||
typedef nbl_glsl_ext_Blur_Parameters_t Parameters_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use this cool trick
struct Parameters_t : impl::nbl_glsl_ext_Blur_Parameters_t
{
};
then we can turn buildParameters
from a static function to a constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this.
Then it can just be:
struct Parameters_t : nbl_glsl_ext_Blur_Parameters_t
{
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could, but I'm a little hazy about a public class inheriting from a private one
static inline uint32_t buildParameters(uint32_t numChannels, const asset::VkExtent3D& inputDimensions, Parameters_t* outParams, DispatchInfo_t* outInfos, | ||
const float radius, const asset::ISampler::E_TEXTURE_CLAMP* wrappingType, const asset::ISampler::E_TEXTURE_BORDER_COLOR* borderColor = nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
document what unit the radius is measured in
turn this into a Parameters_t
constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
document what unit the radius is measured in
Sure, but it is measured in NDC space, isn't it?
turn this into a Parameters_t constructor
buildParameters
also builds the DispatchInfo_t
--add constructor there as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but it is measured in NDC space, isn't it
I don't remember anymore, please deduce, could be normalized tex coord just as well
buildParameters also builds the DispatchInfo_t --add constructor there as well?
We can probably get rid of DispatchInfo_t
cause its pointless to keep it around (the way you work out dispatch sizes is really simple),ca just have a dispatchHelper
which takes passAxis
/passIndex
P.S. we still need a function to tell us how many passes we need.
layout (set = _NBL_GLSL_EXT_BLUR_INPUT_SET_DEFINED_, binding = _NBL_GLSL_EXT_BLUR_INPUT_BINDING_DEFINED_, std430) restrict readonly buffer InputBuffer | ||
{ | ||
float in_values[]; | ||
}; | ||
|
||
#endif | ||
|
||
#ifndef _NBL_GLSL_EXT_BLUR_OUTPUT_DESCRIPTOR_DEFINED_ | ||
#define _NBL_GLSL_EXT_BLUR_OUTPUT_DESCRIPTOR_DEFINED_ | ||
|
||
layout (set = _NBL_GLSL_EXT_BLUR_OUTPUT_SET_DEFINED_, binding = _NBL_GLSL_EXT_BLUR_OUTPUT_BINDING_DEFINED_, std430) restrict writeonly buffer OutputBuffer | ||
{ | ||
float out_values[]; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want the default input and output descriptors to be float
SSBOs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, summed area tables require LOTS of precision, also the only way to use 16bit types would be as STBs (imageBuffer) of R16_UNORM (16bit types in SSBOs require non-ubiquitous extensions).
Realistically speaking, whenever you use a 8 bit format, your LUT would need to be 16 or 32 bit, and whenever you use 16bit your LUT really needs to be 32bit
In fact float
isnt that great, a "software" 32bit UNORM would have been better but we would have had to store a float
denormalization constant per row.
What we should really be asking ourselves is "should we be giving double
or uint64_t
unorm as a intermediate storage option" ?
P.S. you can write this down as a comment, but please dont do anything about it now.
CBlurPerformer::CBlurPerformer(video::IVideoDriver* driver, uint32_t maxDimensionSize, bool useHalfStorage) | ||
: m_maxBlurLen(maxDimensionSize), m_halfFloatStorage(useHalfStorage) | ||
{ | ||
static IGPUDescriptorSetLayout::SBinding bnd[] = | ||
{ | ||
{ | ||
0u, | ||
EDT_STORAGE_BUFFER, | ||
1u, | ||
ISpecializedShader::ESS_COMPUTE, | ||
nullptr | ||
}, | ||
{ | ||
1u, | ||
EDT_STORAGE_BUFFER, | ||
1u, | ||
ISpecializedShader::ESS_COMPUTE, | ||
nullptr | ||
}, | ||
}; | ||
|
||
m_dsLayout = driver->createGPUDescriptorSetLayout(bnd, bnd + sizeof(bnd) / sizeof(IGPUDescriptorSetLayout::SBinding)); | ||
|
||
auto pcRange = getDefaultPushConstantRanges(); | ||
m_pplnLayout = driver->createGPUPipelineLayout(pcRange.begin(), pcRange.end(), core::smart_refctd_ptr(m_dsLayout)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should create a static
create
function instead of a constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, agreed 100%
…lly pulling out values and, separate out a descriptors.glsl from default_compute_blur.comp.
…mit, and utility for creating specialized shaders.
be consitent with naming, the macros/defines should have the |
#ifdef __cplusplus | ||
#define uint uint32_t | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't even use this
namespace nbl | ||
{ | ||
namespace ext | ||
{ | ||
namespace CentralLimitBoxBlur | ||
{ | ||
|
||
struct uvec4 | ||
{ | ||
uint32_t x, y, z, w; | ||
}; | ||
#include "nbl/builtin/glsl/ext/CentralLimitBoxBlur/parameters_struct.glsl" | ||
|
||
class CBlurPerformer final : public core::IReferenceCounted | ||
{ | ||
public: | ||
_NBL_STATIC_INLINE_CONSTEXPR uint32_t DefaultWorkgroupSize = 256u; | ||
_NBL_STATIC_INLINE_CONSTEXPR uint32_t PassesPerAxis = 3u; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few syntax pointers:
- use nested namespace, like this
namespace nbl::ext::CentralLimitBoxBlur
- you can write
static inline constexpr
now, no need for the_NBL_STATIC_INLINE_CONSTEXPR
// At this point I'm wondering do you even need both input and output strides | ||
params.output_strides = params.input_strides; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea was to be able to transpose the array after every pass so that the stores are coalesced (match invocation ID increase with output memory address increase) just like we do in the Blit
If you can deduce the input strides from output strides somehow, great.
in reality we don't need 8 separate values, we probably only need 4:
- YZ, ZX, XY products
- XYZ full product
core::smart_refctd_ptr<video::IGPUComputePipeline> m_ppln = nullptr; | ||
|
||
uint32_t m_maxBlurLen; | ||
bool m_halfFloatStorage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused variable?
auto pcRange = getDefaultPushConstantRanges(); | ||
m_pplnLayout = device->createPipelineLayout(pcRange.begin(), pcRange.end(), core::smart_refctd_ptr(m_dsLayout)); | ||
|
||
auto specShader = createSpecializedShader("nbl/builtin/glsl/ext/CentralLimitBoxBlur/default_compute_blur.comp", m_maxBlurLen, useHalfStorage ? 1u : 0u, device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need m_halfFloatStorage
afterward?
core::smart_refctd_ptr<video::IGPUSpecializedShader> CBlurPerformer::createSpecializedShader(const char* shaderIncludePath, const uint32_t axisDim, const bool useHalfStorage, video::ILogicalDevice* device) | ||
{ | ||
std::ostringstream shaderSourceStream; | ||
shaderSourceStream | ||
<< "#version 460 core\n" | ||
<< "#define _NBL_GLSL_WORKGROUP_SIZE_ " << DefaultWorkgroupSize << "\n" // Todo(achal): Get the workgroup size from outside | ||
<< "#define _NBL_GLSL_EXT_BLUR_PASSES_PER_AXIS_ " << PassesPerAxis << "\n" // Todo(achal): Get this from outside? | ||
<< "#define _NBL_GLSL_EXT_BLUR_AXIS_DIM_ " << axisDim << "\n" | ||
<< "#define _NBL_GLSL_EXT_BLUR_HALF_STORAGE_ " << (useHalfStorage ? 1 : 0) << "\n" | ||
<< "#include \"" << shaderIncludePath << "\"\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you know shaderIncludePath
, dont let it be a parameter
#ifndef _NBL_BUILTIN_GLSL_WORKGROUP_SHARED_BLUR_INCLUDED_ | ||
#define _NBL_BUILTIN_GLSL_WORKGROUP_SHARED_BLUR_INCLUDED_ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this include is in the wrong place and "namespace", should be in ext/CentralLimitBoxBlur
const uint strided_idx = nbl_glsl_dot(uvec4(coordinate, channel), nbl_glsl_ext_Blur_Parameters_t_getInputStrides()); | ||
|
||
float data = 0.f; | ||
if (all(lessThan(coordinate, dims))) | ||
data = in_values[strided_idx]; | ||
|
||
return data; | ||
} | ||
|
||
#endif | ||
|
||
#ifndef _NBL_GLSL_EXT_BLUR_SET_DATA_DEFINED_ | ||
#define _NBL_GLSL_EXT_BLUR_SET_DATA_DEFINED_ | ||
|
||
void nbl_glsl_ext_Blur_setData(in uvec3 coordinate, in uint channel, in float val) | ||
{ | ||
const uint channel_count = nbl_glsl_ext_Blur_Parameters_t_getChannelCount(); | ||
const uvec3 dims = nbl_glsl_ext_Blur_Parameters_t_getDimensions(); | ||
|
||
if (all(lessThan(coordinate, dims))) | ||
{ | ||
const uint strided_idx = nbl_glsl_dot(uvec4(coordinate, channel), nbl_glsl_ext_Blur_Parameters_t_getOutputStrides()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use our new snakeCurve
addressing functions instead of nbl_glsl_dot
?
a07e79f
to
ffbd843
Compare
For #61
I still gotta fix the mirror wrapping case (which I'll fix soon), apart from that it is ready for reviewSome changes are exactly the same as GPU Radix Sort #75 i.e. addition ofnbl_glsl_workgroupBroadcast
for floating point numbers