Async RFC #2551

chaserileyroberts · 2024-09-17T21:16:30Z

This RFC proposes adding several async features to StableHLO, mainly async_start, async_done, and the async<...> value type.

The end goal of this RFC is to allow JAX users to define their own collective matmul schedules, and allow more control to potentially better utilize their hardware.

ezhulenev · 2024-09-17T21:21:51Z

rfcs/20240917-async-support.md

+    stablehlo.return %new_sum : tensor<i64>
+} : (tensor<i64>, tensor<i64>) -> async<tensor<i64>>
+
+%result = "stablehlo.async_done"(%future): async<tensor<i64>> -> tensor<i64>


Let's make async_done variadic and:

Await on all async results nicely mapped to async-done HLO

Await on one async result can be represented with async-update (needs fact checking), but initially we don't need to implement it

I like when type is defined on the same line as op name, I think this should be parseable in MLIR. Also for done return type can be inferred from arguments, no need to spell it. But these details can be refined later.

// HLO async-start %f0, %f1 = stablehlo.async_start(...) -> async<tensor<f32>>, async<tensor<f32>> { %0 = ... : tensor<f32> %1 = ... : tensor<f32> stablehlo.return %0, %1 : tensor<f32>, tensor<f32> } // HLO async-done %t0, %t1 = stablehlo.async_done %f0, %f1 : async<tensor<f32>>, async<tensor<f32>>

So should the the return type of async_start be (async<R0>, ..., async<RM>) instead of async<(R0, ..., RM)>?

Yes, because conceptually it should be possible to await on just one result. On GPU that should have a straightforward lowering to streams and events

That makes sense. I'll update the spec then.

ezhulenev · 2024-09-17T21:51:31Z

rfcs/20240917-async-support.md

+
+### Async Execution
+
+Stable HLO programs are usually defined as simple sequential operations performed one after another, 


Actually StableHLO (and HLO) has a dataflow semantics, and backend is free to reorder execution as long as it respects data dependencies (at HLO we also have control deps, but they are internal implementation detail)

I'll reword this.

frgossen · 2024-09-24T20:52:47Z

rfcs/20240917-async-support.md

+
+Produces the output from executing the `body` function, but runs all operations on a stream separate from the main compute stream.
+
+The output of an `async_start` computation must first be processed by an `async_done` operation.


Do you expect to be able to pipe this through control flow. This is significantly more complicated if you want to pipe this through while and conds.

I think in MLIR-land that should be straightforward as stable HLO has straightforward structured control flow, and I think builtin MLIR dataflow analysis will work (needs fact checking!)

Yes, for SHLO this should be simple. I was rather wondering about the lowerings through XLA

For now i think we would be ok with this not needing to cross control flow boundaries, but I could see the need eventually.

frgossen · 2024-09-24T20:55:34Z

rfcs/20240917-async-support.md

+
+
+```ebnf
+AsyncType ::= 'async' '<' ValueType '>'


Why do we need an async data type here? Within XLA, we encode this through tuples that forward operands and interim result from async start to async done ops. Would the same work on the stable HLO level?

Where I think this will be a bit special is when you want to guarantee that the values are not copied on loop boundaries, or generally around control flow. An alternative would be to introduce non-copyable values (buffers (?)).

In HLO we use tuples because it's very hard to add new types to HLO. In MLIR adding types is very easy and natural. If extending HLO would not be that hard, I'd vote for adding async type to it as well.

Upd: in HLO we use tuples for async ops and rely on bunch of implicit assumption about how scheduling and buffer assignment works, and I'm not a big fan of it, because if you don't know the implementation detail, from HLO alone it's very hard to tell what's going on. HLO starts as value semantics, but then at some point becomes a buffer semantics, but in printed HLO nothing tells you what is the semantics. Keeping sHLO value-based with types imho a lot easier to parse for a human and to tell what's going on from reading IR.

Makes sense. We will have to lower it to the tuples anyways no? So this is really some form of syntactic sugar.

Yes, I think async type <-> async bundle (tuples) representations are isomorphic and always can be converted back and forth:

stablehlo.async_start -> %start = (args, results, sync-flag) async_start

stablehlo.async_done %ret0, %ret1, ... -> async-done (get-tuple-element)

Tricky case: stablehlo.async_done %ret0 just one of the returned values -> async-update (get-tuple-element), effectively peels N result buffers and M argument buffers from a tuple, and allows buffer assignment to reuse them (sorry, only internal link for Frederik http://goto.google.com/async-update-peeling). This is underspecified in HLO, and we don't need to support it today, and require that async_done must await on ALL results of corresponding start operation.

chaserileyroberts · 2024-10-01T18:35:55Z

Closing this as we move to extending jax.compute_on instead

chaserileyroberts force-pushed the main branch from af9df10 to a1419dd Compare September 17, 2024 21:18

ezhulenev reviewed Sep 17, 2024

View reviewed changes

chaserileyroberts force-pushed the main branch from a1419dd to 7d5744a Compare September 17, 2024 23:17

Added async RFC

b7f938d

chaserileyroberts force-pushed the main branch from 7d5744a to b7f938d Compare September 18, 2024 17:34

Async is now a ValueType

53765b5

frgossen suggested changes Sep 24, 2024

View reviewed changes

chaserileyroberts closed this Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async RFC #2551

Async RFC #2551

chaserileyroberts commented Sep 17, 2024 •

edited

Loading

ezhulenev Sep 17, 2024

ezhulenev Sep 17, 2024

chaserileyroberts Sep 17, 2024

ezhulenev Sep 17, 2024

chaserileyroberts Sep 17, 2024

ezhulenev Sep 17, 2024

chaserileyroberts Sep 17, 2024

frgossen Sep 24, 2024

ezhulenev Sep 24, 2024

frgossen Sep 25, 2024

chaserileyroberts Sep 25, 2024 •

edited

Loading

frgossen Sep 24, 2024

ezhulenev Sep 24, 2024 •

edited

Loading

frgossen Sep 25, 2024

ezhulenev Sep 25, 2024

chaserileyroberts commented Oct 1, 2024


		### Async Execution

		Stable HLO programs are usually defined as simple sequential operations performed one after another,


		Produces the output from executing the `body` function, but runs all operations on a stream separate from the main compute stream.

		The output of an `async_start` computation must first be processed by an `async_done` operation.

Async RFC #2551

Async RFC #2551

Conversation

chaserileyroberts commented Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chaserileyroberts Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ezhulenev Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chaserileyroberts commented Oct 1, 2024

chaserileyroberts commented Sep 17, 2024 •

edited

Loading

chaserileyroberts Sep 25, 2024 •

edited

Loading

ezhulenev Sep 24, 2024 •

edited

Loading