Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST]Why Does CUTLASS Use 3-4-3 Swizzle? #2015

Open
ziyuhuang123 opened this issue Dec 27, 2024 · 2 comments
Open

[QST]Why Does CUTLASS Use 3-4-3 Swizzle? #2015

ziyuhuang123 opened this issue Dec 27, 2024 · 2 comments

Comments

@ziyuhuang123
Copy link

May I kindly ask why the swizzle configuration in CUTLASS is specifically set to 3, 4, and 3? I would greatly appreciate any insights or explanations regarding the rationale behind this design choice. Thank you so much in advance!

// K-major GMMA layouts in units of bits
using Layout_K_INTER_Atom_Bits  = ComposedLayout<Swizzle<0,4,3>, smem_ptr_flag, Layout<Shape<_8, _128>,Stride< _128,_1>>>;
using Layout_K_SW32_Atom_Bits   = ComposedLayout<Swizzle<1,4,3>, smem_ptr_flag, Layout<Shape<_8, _256>,Stride< _256,_1>>>;
using Layout_K_SW64_Atom_Bits   = ComposedLayout<Swizzle<2,4,3>, smem_ptr_flag, Layout<Shape<_8, _512>,Stride< _512,_1>>>;
using Layout_K_SW128_Atom_Bits  = ComposedLayout<Swizzle<3,4,3>, smem_ptr_flag, Layout<Shape<_8,_1024>,Stride<_1024,_1>>>;
@leimao
Copy link
Contributor

leimao commented Dec 28, 2024

I have my own interpretations.

@ziyuhuang123
Copy link
Author

I have my own interpretations.

Thank you so much! Your post has been incredibly helpful! I actually studied L2 persistent before and remember coming across your post at that time. I'm part of a very active NV WeChat discussion group. I was wondering if you might be interested in joining us to explore CUDA technologies together? My WeChat ID is hermit_purple1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants