-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interp doc for implementation of calls #2858
base: feature/CoreclrInterpreter
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,46 @@ | ||||||
### Interpreter call convention | ||||||
|
||||||
Within the interpreter, every single method will have a structure allocated that holds information about the managed method. Let's assume the name of this structure is `InterpMethod` (which is its name within the Mono runtime). This structure will contain important information that is relevant during execution (for example pointer to the interpreter IR code, the size of the stack that it uses and much more information). | ||||||
|
||||||
Interpreter opcodes operate on offsets from the base stack pointer of the current interpreter frame. The arguments for a method will be stored one after the other on the interpreter stack. Each argument will be stored at an offset on the stack aligned to 8 bytes (at least on an 64 bit arch). Every primitive type would fit in an 8 byte stack slot, while valuetypes could occupy a larger stack space. With every call, the interpreter stack pointer will be bumped to a new location where the arguments are already residing. This means that the responsability of a call instruction is to first resolve the actual method that needs to be called and then just initialize the state to prepare the execution of the new method (mainly initialize a new `InterpFrame`, set current stack pointer to the location of the arguments on the interp stack and the `ip` to the start of the IR opcode buffer). | ||||||
|
||||||
If we need to call a method that has its code compiled, then we would instead call some thunk that does the call convention translation, moving arguments from the interpreter stack to the native stack/regs, and which later dispatches to the compiled code. | ||||||
|
||||||
### Direct calls | ||||||
|
||||||
A direct calls is a call where the method to be called is known during method compilation time, for example for static or non-virtual calls. In this situation, when the call is compiled, an `InterpMethod` is allocated for the target method that needs to be called and this pointer is embedded into the generated interpreter code. No additional work is needed at execution, the `InterpMethod` is fetched from the opcode stream, for the first call the method will have to be compiled and then call dispatch continues as described above. | ||||||
|
||||||
In order to account for the scenario where the method to be called is aot compiled, when emitting code during compilation, we would first check if the method is present in an aot image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CoreCLR has been moving towards similar kind of transition wrappers to support MethodInfo.Invoke. They are dynamically generated and JITed today. It would be nice to share the transition wrappers between reflection and interpreter. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Throughout. |
||||||
|
||||||
### Virtual/Interface calls | ||||||
|
||||||
When we need to do a virtual call, we would include the virtual `InterpMethod` in the call opcode but this needs to be resolved at runtime on the object that it gets called on. Calling into the runtime to resolve the target method is expected to be a slow process so an alternative is required. | ||||||
|
||||||
The Mono approach is based on the fact that each virtual method of a class has a designated slot in the vtable, that is constant on all classes that implement it. This means that the in the object header we can have a method table with each slot containing the target `InterpMethod` that needs to be called when calling through that slot. For virtual methods that have generic arguments, we could end up having the same slot occupied by multiple generic instantiations of the same method. Instead of having a single target method in this slot, we would have a linked list of key/value pairs, mapping the virtual method that needs to be called to the target method. When a call is made through this slot, we would iterate the list and if we don't find the method to call, we would call into the runtime to resolve the virtual method and then add it to this collection. For interface calls, we have a separate method table that has a fixed size. For each interface method we can compute a slot in this table, that is fixed across all types implementing this interface. Calling an interface method means calling through this slot where we can always have collisions with methods from other interfaces. We reuse the same mechanism of calling through a slot for virtual generic methods, with each slot holding a list of interface method / target method pairs, that are used to resolve the actual method that needs to be called. | ||||||
|
||||||
If we are to follow a similar approach, this would mean that in the `MethodTable` we would have at least an additional table where we would cache pairs of virtual method / target method. We could also have a one entry cache per call site, following on the idea of virtual stub dispatch. My current understanding is that if this call site cache fails, then falling back to calling into the runtime would be way too slow, so we would still need to have some sort of cache table in the `MethodTable` for the already resolved virtual calls on the type. If this is the case, then call site caching would be a feature that would provide limited benefit over `MethodTable` lookups, and can therefore be implemented later on if considered useful. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CoreCLR uses per-call site cache to optimize monomorphic calls and global singleton hashtable for polymorphic calls. Interpreter should use the same strategy as the rest of the CoreCLR. There are number of different possible interface dispatch strategies with different tradeoffs. I do not think it makes sense to use different strategies in a single runtime. It would require paying twice for the supporting data structures, code and implementation complexity. FWIW, .NET Framework 1.0 used yet another strategy: It was similar to the Mono strategy expect that it used extra indirection to avoid collisions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I really dislike the idea of tweaking the |
||||||
|
||||||
During compilation, we have no way to tell whether a virtual method will resolve to a method that was already compiled or that needs to be interpreted. This means that once we resolve a virtual method we will have to do an additional check before actually executing the target method with the interpreter. We will have to look up for the method code in aot images (this lookup should happen only during the first invocation, a flag should be set on the `InterpMethod` once the lookup is completed). Following this check we would either continue with a normal intepreter call or dispatch via a transition wrapper. This would mean that we can have an `InterpMethod` structure allocated for an aot compiled method, but a flag would be set that this is aot-ed and we never attempt to execute it with the interpreter. | ||||||
|
||||||
### Indirect calls | ||||||
|
||||||
Indirect calls in the interpreter mean that we have no information about the method to be called at compile time, there is no `InterpMethod` descriptor embedded in the code. The method to be called would be loaded from the interpreter stack. If we would know for sure we are calling into the interpreter, then the solution would be trivial. Code that loads a function pointer would load an `InterpMethod` and indirect calls would just execute it in the same way as a normal call would. The problem arises from interoping with compiled code. Compiled code would rather express a function pointer as a callable native pointer whereas the interpreter would rather express it directly as an `InterpMethod` pointer. | ||||||
|
||||||
Based on the assumption that a method can either be compiled (aka present in AOT images) or interpreted, we could impose the condition that a function pointer for a compiled method is an aligned native function pointer and that a function pointer for an interpreted method is a tagged `InterpMethod`. The `InterpMethod` descriptor will need to contain a field for the `interp_entry` thunk that can be invoked by compiled code in order to begin executing this method with interpreter. Both representations would need to be handled by JIT and interpreter compilation engines. | ||||||
|
||||||
On the interpreter side of things, `ldftn` and `ldvirtftn` opcodes would resolve the method to be loaded and then look it up in the AOT images. If the method is found, then we would load the address of the methods code, otherwise we would load a tagged `InterpMethod` descriptor. `calli` would check if the function pointer is tagged or not. If it is tagged, it will untag it and proceed with the normal interpreter call invocation. If it is not tagged, then it will obtain the appropriate transition thunk (for on the signature embedded in the `calli` instruction) and it will call it passing the compiled code pointer together with the pointer to the arguments on the interpreter stack. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should the same tagged pointer approach be used to dispatch virtual methods? ie the virtual method gets resolved to a code pointer that may be a tagged pointer, and then we treat it as an indirect call. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds like a good idea that could help with reusing the same virtual dispatch machinery between interp/aot. |
||||||
|
||||||
In a similar fashion, on the JIT side of things, `ldftn` and `ldvirtftn` will either produce a callable function pointer or a tagged `InterpMethod` descriptor. `calli` would check if the function pointer is not tagged, in which case it would do a normal native call. If the function pointer is tagged, then it will instead do a normal native call through `InterpMethod->interp_entry`. This field will have to be initialized to an interp entry as part of the `ldftn`, but it could also be lazily initialized at the cost of another check. | ||||||
|
||||||
### PInvoke calls | ||||||
|
||||||
PInvoke calls can be made either by a normal `call` to a pinvoke method or by doing a `calli` with an unmanaged signature. In both cases the target function pointer that is available during code execution is a callable native function pointer. We must just obtain the transition wrapper and proceed to call it passing the native ftnptr. | ||||||
|
||||||
### Calls from compiled code | ||||||
|
||||||
Whenever compiled code needs to call a method, unless the call is direct, it will at some point query the runtime for the address of the method code. If at any point the runtime fails to resolve the method code, it should fallback to generating/obtaining a transition wrapper for the method in question. The caller will have no need to know whether it is calling compiled code or whether it is calling into the interpreter, the calling convention will be identical. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this is the interpreter mode, I think I'd like to have a requirement imposed here and say the thunk must be pre compiled. I don't see the obvious benefit to generating anything on the fly. I'd like to see some of these cases narrowly defined with asserts rather than functionality that we aren't testing or using regularly. Do we have a scenario where we need an unmanaged entry point and don't have an AOT image? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wasm? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't that just a current implementation detail? Is it needed for CoreCLR's interpreter? For iOS we can utilize the remapping trick so generating an actual callable native function pointer is relatively simple. For WASM, can we impose a requirement that there be some AOT requirement for reverse P/Invoke thunks or we generate a finite set of entries and when that is exhausted, failure? It is possible I am missing something here, but I'd really like to avoid generating scenarios are complete for completeness sake. Inevitably we create solutions that bit rot. This is top of mine for me because we have some many features that aren't used and dictate how we currently design and build new features. If we need them and they have consistent testing, let's do it, otherwise we shouldn't be creating features that will inevitably atrophy because they aren't needed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've touched this subject in the However, on a separate note, I strongly think it would be good to be able to compile these wrappers at run time with JIT, in case it is not too difficult. If you have a bug that you are sure is interpreter related, it might be a massive pain to have to run all sorts of build tools like assembly scanning, aot all wrappers etc. Just writting a console sample and run the assembly with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is reasonable to require some sort of AOT image for Wasm that does imply some sort of AOT pass for both build and publish. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lewing wouldn't that mean we'd need to have the wasm toolchain available all the time? |
||||||
|
||||||
### Delegate calls | ||||||
|
||||||
The runtime provides special support for creating/invoking delegates. A delegate invocation can end up either in compiled code or in the interpreter. On mono, the delegate object has jit specific fields and interpreter specific fields and the initialization is quite messy. Ideally, as a starting solution at least, a delegate would contain a function pointer and its invocation would reuse the patterns for indirect calls via `calli` and would not require much special casing. | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to enumerate the different types of transition wrappers that we expect to exist in the AOT image to support interaction of interpreted and AOT compiled code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description of transition wrappers is included in the
compiled-code-interop.md
document. There are mainly two types of wrappers, each handling a type of signature:It's plausible that we would encounter scenarios where a required wrapper is missing and I've describe alternatives, but I don't think it is something that should be prioritized in this stage. Also I imagine there could be minor differences between interoping with aot and interoping with foreign code via pinvokes. I'm not convinced whether this would actually require separate wrappers.