Skip to content

July 15 2024

Carl Pearson edited this page Jul 15, 2024 · 5 revisions

Attds: Evan, Cedric, Gabriel, Hugo, Joseph, Vivek, Carl, Nicole

Note-taker

Carl

General Topics

  • The name
    • Carl: Grateful to resolve this (paperwork for CI, funding proposals, common "branding")
    • Poll conducted over slack, closed when this meeting started
      • (pre-filled results, double-check when meeting starts)
      • Strongly Kokkos Comm: 3
      • Weakly Kokkos Comm: 5
      • Neutral: 0
      • Weakly Kokkos MPI: 6
      • Strongly Kokkos MPI: 1
    • Results solidly in favor of Kokkos Comm, so we will keep that name

image

  • Transports/CommunicationSpace PR: #109

    • Definitely want to support NCCL
    • Do we have preferred space + fallback if primary does not (yet) support
    • Do we always use preferred or is it a runtime decision
      • Maybe this hybrid is actually different CommunicationSpace?
    • Handle in here is not a final design, just had to hide MPI stuff somewhere
  • Starting discussions about semantics for calling Kokkos Comm inside a parallel region

    • Issue #115
    • e.g. NCCL functions cannot be called from on the device (today). Basically CUDA kernel launches. Communication params associated with communicator, not individual function calls.
    • Let's just disallow our exiting API from being called in parallel regions
      • A future where we have a split API (some allowed, some not allowed) probably
  • Semantics of execution space on communication handle #108

    • What is the meaning of the param? Waiting on data prep, or for any needed work?
    • Proposal A: require user to fence first, just use the space for any work
      • then we don't have to add the fence ourself
    • Proposal B: communication is ordered w.r.t. the instance
      • what if the communication takes multiple views where data needs to be available?
        • we say data needs to be ready and will be produced semantically in this space, if you need something fancier then you have to fence yourself
      • Would have to provide a way to skip internal fence, maybe just a boolean argument for now.
    • Proposal C: reworking the API using Kokkos Tasks (a bit undefined, maybe reworked in the future, or gone)
      • Kokkos Graph perhaps?
    • Proposal D: sender/receiver somehow?

Working Groups

  • NCCL (Gabriel, Nicole, Evan)
  • CI for GPUs (Carl)
    • will start the paperwork now that the name is decided
  • Applications
  • Modern C++ / MPI
    • sender/receiver for execution space semantics above.
  • Accelerator-Initiated Communication / Support
    • Issue discussion above
  • Smart NICs

Round Table

Nothing else from anyone