You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a rather large working item that will span across multiple PRs.
Currently Wasmi uses an enum based Instruction to represent all of its Wasmi bytecode instructions.
Pros & Cons
The advantage of this approach is that it is a very simple and rusty representation, easy to work with and allows to avoid unsafe Rust as much as possible.
The downside of this approach is that it is extremely dependent on the Rust and LLVM compiler and optimization pipeline to produce great machine code for the executor in the end as past experiences and experiments showed time and time on. The underlying issue is that this representation is too abstract and does not allow for the fine grained control over the layout and dispatch mechanism.
Major Flaw
Furthermore the enum based Instruction representation with its associated data fields cannot be easily combined with explicit tail calls once they are available in Rust (if ever).
The makepad-stitch Wasm interpreter has clearly showed that tail call based instruction dispatch is clearly superior to the enum based approach that Wasmi is using, especially on Apple Silicon and modern CPUs.
Threaded Code
The dispatching technique that we want to use is called threaded code dispatch and is made possible by explicit tail calls amongst others. Fundamentally there are two different kinds of threaded code:
Direct threading: This stores pointers to instruction handlers directly in the bytecode and simply invokes them once the instruction pointer points to them. This is fast and has the least amount of indirections during execution but not very space efficient because a pointer on a 64-bit machine takes up a whopping 8 bytes.
Indirect Threading: This stores simple C-like enum instruction op-codes instead of instruction handler function pointers directly in the bytecode and thus requires an additional indirection to resolve from the C-like enum variant to the associated instruction handler during execution. Some papers suggest that the overhead compared to direct threaded code is insignificant but we should probably test this ourselves. This form is easier to debug and simpler to deal with since the op-codes are still simple integers and not raw pointers.
Due to the above research related question of direct or indirect threaded code it is important to implement the new architecture to easily exchange the used technique in the end.
In case we want to support both styles of threaded code it should be easily possible to design the architecture so that conditional compilation could enable to disable either dispatching mode via a new Wasmi crate feature.
The new Wasmi bytecode can be dispatched using the match+loop approach used so far as a fall back when explicit tail calls are not available for an architecture or until they are not available in Rust at all. The execution performance difference should ideally be insignificant from the performance of today's Wasmi bytecode executor.
Tasks
Redefine Wasmi bytecode in terms of a new InstructionEncoder that provides a large API surface to encode all the required Wasmi bytecodes. We should stay as close as possible to the current set while also take into account for the new possibilities of the more relaxed bytecode representation. The new bytecode representation is more relaxed because it will no longer be organized on an 8 byte "instruction word" buffer but on a raw byte buffer. This means the encoding of an instruction is less restricted with respect to the amount of bytes it may use. We want this InstructionEncoder to be usable from both the Wasmi translator as well as the Wasmi translation test suite.
Rewrite the Wasmi translator in terms of the new InstructionEncoder.
Adjust all the translation tests to use the new InstructionEncoder and the new Wasmi translator.
Implement an InstructionDecoder that can be used to decode instructions encoded by the InstructionEncoder. This InstructionDecoder shall offer an efficient API for the Wasmi bytecode executor.
Implement the new Wasmi bytecode executor based on the new InstructionDecoder.
The text was updated successfully, but these errors were encountered:
With #1152 merged we now have a useful baseline Wasmi bytecode representation with which we could (in theory) test this out once explicit tail calls have been merged into Rust proper.
This has effectively been implemented in #1152.
With the new Wasmi bytecode and the macro sourced definitions we will be able to more or less simply move to tail call based dispatch once that is available.
This is a rather large working item that will span across multiple PRs.
Currently Wasmi uses an
enum
basedInstruction
to represent all of its Wasmi bytecode instructions.Pros & Cons
unsafe
Rust as much as possible.Major Flaw
Furthermore the
enum
basedInstruction
representation with its associated data fields cannot be easily combined with explicit tail calls once they are available in Rust (if ever).The
makepad-stitch
Wasm interpreter has clearly showed that tail call based instruction dispatch is clearly superior to theenum
based approach that Wasmi is using, especially on Apple Silicon and modern CPUs.Threaded Code
The dispatching technique that we want to use is called
threaded code
dispatch and is made possible by explicit tail calls amongst others. Fundamentally there are two different kinds ofthreaded code
:enum
instruction op-codes instead of instruction handler function pointers directly in the bytecode and thus requires an additional indirection to resolve from the C-likeenum
variant to the associated instruction handler during execution. Some papers suggest that the overhead compared to directthreaded code
is insignificant but we should probably test this ourselves. This form is easier to debug and simpler to deal with since the op-codes are still simple integers and not raw pointers.Due to the above research related question of direct or indirect threaded code it is important to implement the new architecture to easily exchange the used technique in the end.
In case we want to support both styles of
threaded code
it should be easily possible to design the architecture so that conditional compilation could enable to disable either dispatching mode via a new Wasmi crate feature.Link to GitHub Gist where I tried out some potential structure on the compiler explorer website to see what assembly is generated: https://gist.github.com/Robbepop/0b254c86dccabc48ae0eb41e89567ea0
Fallback & Performance
The new Wasmi bytecode can be dispatched using the
match
+loop
approach used so far as a fall back when explicit tail calls are not available for an architecture or until they are not available in Rust at all. The execution performance difference should ideally be insignificant from the performance of today's Wasmi bytecode executor.Tasks
InstructionEncoder
that provides a large API surface to encode all the required Wasmi bytecodes. We should stay as close as possible to the current set while also take into account for the new possibilities of the more relaxed bytecode representation. The new bytecode representation is more relaxed because it will no longer be organized on an 8 byte "instruction word" buffer but on a raw byte buffer. This means the encoding of an instruction is less restricted with respect to the amount of bytes it may use. We want thisInstructionEncoder
to be usable from both the Wasmi translator as well as the Wasmi translation test suite.InstructionEncoder
.InstructionEncoder
and the new Wasmi translator.InstructionDecoder
that can be used to decode instructions encoded by theInstructionEncoder
. ThisInstructionDecoder
shall offer an efficient API for the Wasmi bytecode executor.InstructionDecoder
.The text was updated successfully, but these errors were encountered: