-
Notifications
You must be signed in to change notification settings - Fork 10
July 15 2024
Carl Pearson edited this page Jul 15, 2024
·
5 revisions
Attds: Evan, Cedric, Gabriel, Hugo, Joseph, Vivek, Carl, Nicole
Carl
- The name
- Carl: Grateful to resolve this (paperwork for CI, funding proposals, common "branding")
- Poll conducted over slack, closed when this meeting started
- (pre-filled results, double-check when meeting starts)
- Strongly Kokkos Comm: 3
- Weakly Kokkos Comm: 5
- Neutral: 0
- Weakly Kokkos MPI: 6
- Strongly Kokkos MPI: 1
- Results solidly in favor of Kokkos Comm, so we will keep that name
-
Transports/CommunicationSpace PR: #109- Definitely want to support NCCL
- Do we have preferred space + fallback if primary does not (yet) support
- Do we always use preferred or is it a runtime decision
- Maybe this hybrid is actually different CommunicationSpace?
- Handle in here is not a final design, just had to hide MPI stuff somewhere
-
Starting discussions about semantics for calling Kokkos Comm inside a parallel region
- Issue #115
- e.g. NCCL functions cannot be called from on the device (today). Basically CUDA kernel launches. Communication params associated with communicator, not individual function calls.
- Let's just disallow our exiting API from being called in parallel regions
- A future where we have a split API (some allowed, some not allowed) probably
-
Semantics of execution space on communication handle #108
- What is the meaning of the param? Waiting on data prep, or for any needed work?
- Proposal A: require user to fence first, just use the space for any work
- then we don't have to add the fence ourself
- Proposal B: communication is ordered w.r.t. the instance
- what if the communication takes multiple views where data needs to be available?
- we say data needs to be ready and will be produced semantically in this space, if you need something fancier then you have to fence yourself
- Would have to provide a way to skip internal fence, maybe just a boolean argument for now.
- what if the communication takes multiple views where data needs to be available?
- Proposal C: reworking the API using Kokkos Tasks (a bit undefined, maybe reworked in the future, or gone)
- Kokkos Graph perhaps?
- Proposal D: sender/receiver somehow?
- NCCL (Gabriel, Nicole, Evan)
- CI for GPUs (Carl)
- will start the paperwork now that the name is decided
- Applications
- Modern C++ / MPI
- sender/receiver for execution space semantics above.
- Accelerator-Initiated Communication / Support
- Issue discussion above
- Smart NICs
Nothing else from anyone