-
Notifications
You must be signed in to change notification settings - Fork 41
RMA WG 06 07 2018
- Contexts Questions (Nick)
- Naveen (Cray); Min Si (ANL); Swaroop, Thomas and Manju (ORNL); Jim, Dave, Alex, and Wasi (Intel); Tony (SBU); Khaled (AMD); Nick (DOD); Anshuman Goswami (NVIDIA)
- Ask users whether they need 8, 16, 32, 64, and 128 (and even "mem") versions of the put-with-signal function. Are these still needed given that we have type support for fixed-size integers? Jim says these may be connected to Fortran type "kinds" (e.g. integer kind=8).
- Poke NVIDIA folks to check if this can be implemented using GPU direct.
There can be big benefits to ensuring alignment in some cases, so keeping the sized versions is useful.
Naveen asked about 0-count puts with signal, whether signal still needs to be sent. Consensus seems to be "yes" but nneed to be explicit about this in the spec.
Anshuman mentioned peer-to-peer GPU issues with different strength load/stores e.g. on Volta and use of thread_fences. An interposed read to GPU bar would perform ordering. Discussion about performance impact of multiple operations, Anshmuan said large transfers would swamp the extra operations. Opinion overall seems to be that p+s would not be worse than pfp. GPU might need persistent thread to catch the signal, Anshuman thinks there might be issues with deadlock in the GPU runtime. Will come up with examples.
Khaled indicated AMD GPUs seem to have same limitations at the moment, but later versions will not, probably true of NVIDIA too (TONY Q: did I get this bit right?)
Khaled/Manju/Jim discussed immediate operations on different networks (IBV, OPA) for reducing number of operations/network transfers. OPA can do onloaded immediate ops. via Active Messages.
Add non-block version of fence: libfabric has triggered ops with deferred work queues that can perform later signal. (TONY addition: UCX has callbacks in NB routines that could probably do the same thing).
Anshuman added that peer-to-peer may not be the same op as p+s depending on different load/store strengths. Action Item: will double check.
This corresponds to Issue #221
Nick would like a compile-time constant that can initialize new contexts that have not been created yet. This is for use in libraries that want idempotent behaviors. Proposes use of constant such as SHMEM_CTX_INVALID
rather than SHMEM_CTX_NULL
since we're not requiring any knowledge of what shmem_ctx_t
is (probably a pointer, but doesn't have to be). A shmem_ctx_destroy
on an "invalid" context is a no-op.
-
Working Groups
-
Errata