You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
May I kindly ask why the swizzle configuration in CUTLASS is specifically set to 3, 4, and 3? I would greatly appreciate any insights or explanations regarding the rationale behind this design choice. Thank you so much in advance!
// K-major GMMA layouts in units of bits
using Layout_K_INTER_Atom_Bits = ComposedLayout<Swizzle<0,4,3>, smem_ptr_flag, Layout<Shape<_8, _128>,Stride< _128,_1>>>;
using Layout_K_SW32_Atom_Bits = ComposedLayout<Swizzle<1,4,3>, smem_ptr_flag, Layout<Shape<_8, _256>,Stride< _256,_1>>>;
using Layout_K_SW64_Atom_Bits = ComposedLayout<Swizzle<2,4,3>, smem_ptr_flag, Layout<Shape<_8, _512>,Stride< _512,_1>>>;
using Layout_K_SW128_Atom_Bits = ComposedLayout<Swizzle<3,4,3>, smem_ptr_flag, Layout<Shape<_8,_1024>,Stride<_1024,_1>>>;
The text was updated successfully, but these errors were encountered:
Thank you so much! Your post has been incredibly helpful! I actually studied L2 persistent before and remember coming across your post at that time. I'm part of a very active NV WeChat discussion group. I was wondering if you might be interested in joining us to explore CUDA technologies together? My WeChat ID is hermit_purple1.
May I kindly ask why the swizzle configuration in CUTLASS is specifically set to 3, 4, and 3? I would greatly appreciate any insights or explanations regarding the rationale behind this design choice. Thank you so much in advance!
The text was updated successfully, but these errors were encountered: