Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Question About Non-owning Tensor #1865

Closed
ZhangZhiPku opened this issue Oct 12, 2024 · 1 comment
Closed

[QST] Question About Non-owning Tensor #1865

ZhangZhiPku opened this issue Oct 12, 2024 · 1 comment

Comments

@ZhangZhiPku
Copy link

In the tutorial:
https://github.com/NVIDIA/cutlass/blob/main/media/docs/cute/0y_predication.md

We have the following code snippet:

Tensor cA   = make_identity_tensor(make_shape(size<0>(sA), size<1>(sA)));  // (BLK_M,BLK_K) -> (blk_m,blk_k)
Tensor tAcA = local_partition(cA, tA, thread_idx);

Tensor cB   = make_identity_tensor(make_shape(size<0>(sB), size<1>(sB)));  // (BLK_N,BLK_K) -> (blk_n,blk_k)
Tensor tBcB = local_partition(cB, tB, thread_idx);

// Populate
CUTE_UNROLL
for (int m = 0; m < size<0>(tApA); ++m) {
  tApA(m,0) = get<0>(tAcA(m,0)) < m_max_coord;
}
CUTE_UNROLL
for (int n = 0; n < size<0>(tBpB); ++n) {
  tBpB(n,0) = get<0>(tBcB(n,0)) < n_max_coord;
}

In the above code, we created two predicate tensors, cA and cB.
I found that when calling the make_identity_tensor function, we are actually creating a tensor view(cA, cB). No memory is allocated to store the contents of cA or cB; instead, an iterator is created.( https://github.com/NVIDIA/cutlass/blob/cc3c29a81a140f7b97045718fb88eb0664c37bd7/include/cute/tensor_impl.hpp)

So, why are we able to modify cA, cB matrix in the tutorial code with:

tApA(m,0) = get<0>(tAcA(m,0)) < m_max_coord;
tBpB(n,0) = get<0>(tBcB(n,0)) < n_max_coord;

Does this call implicitly convert the cB matrix from a tensor view into a tensor with actual allocated memory?

@ccecka
Copy link

ccecka commented Oct 13, 2024

tAcA is the (tiled, sliced, partitioned) coordinate tensor. You're right that it is read-only.

tApA is the predicate tensor that actually stores bool

Tensor tApA = make_tensor<bool>(shape(tAcA));

which is read/write. So the above code is precomputing and storing predicates (to reuse in a mainloop, for example) via the read-only coordinate tensor.

I agree that I should revisit+update that documentation though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants