-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interp doc for implementation of calls #2858
base: feature/CoreclrInterpreter
Are you sure you want to change the base?
Interp doc for implementation of calls #2858
Conversation
Included also initial document about runtime APIs used by the mono interpreter.
docs/design/interpreter/calls.md
Outdated
|
||
The Mono approach is based on the fact that each virtual method of a class has a designated slot in the vtable, that is constant on all classes that implement it. This means that the in the object header we can have a method table with each slot containing the target `InterpMethod` that needs to be called when calling through that slot. For virtual methods that have generic arguments, we could end up having the same slot occupied by multiple generic instantiations of the same method. Instead of having a single target method in this slot, we would have a linked list of key/value pairs, mapping the virtual method that needs to be called to the target method. When a call is made through this slot, we would iterate the list and if we don't find the method to call, we would call into the runtime to resolve the virtual method and then add it to this collection. For interface calls, we have a separate method table that has a fixed size. For each interface method we can compute a slot in this table, that is fixed across all types implementing this interface. Calling an interface method means calling through this slot where we can always have collisions with methods from other interfaces. We reuse the same mechanism of calling through a slot for virtual generic methods, with each slot holding a list of interface method / target method pairs, that are used to resolve the actual method that needs to be called. | ||
|
||
If we are to follow a similar approach, this would mean that in the `MethodTable` we would have at least an additional table where we would cache pairs of virtual method / target method. We could also have a one entry cache per call site, following on the idea of virtual stub dispatch. My current understanding is that if this call site cache fails, then falling back to calling into the runtime would be way too slow, so we would still need to have some sort of cache table in the `MethodTable` for the already resolved virtual calls on the type. If this is the case, then call site caching would be a feature that would provide limited benefit over `MethodTable` lookups, and can therefore be implemented later on if considered useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CoreCLR uses per-call site cache to optimize monomorphic calls and global singleton hashtable for polymorphic calls. Interpreter should use the same strategy as the rest of the CoreCLR.
There are number of different possible interface dispatch strategies with different tradeoffs. I do not think it makes sense to use different strategies in a single runtime. It would require paying twice for the supporting data structures, code and implementation complexity.
FWIW, .NET Framework 1.0 used yet another strategy: It was similar to the Mono strategy expect that it used extra indirection to avoid collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really dislike the idea of tweaking the MethodTable
for interpreter specific stuff and I'm all for reusing the same approach for virtual dispatch that the jit has. I was just not sure about how easy it is to efficiently reuse the same machinery since I'm thinking it was designed for calls from compiled code, patching it etc.
|
||
Based on the assumption that a method can either be compiled (aka present in AOT images) or interpreted, we could impose the condition that a function pointer for a compiled method is an aligned native function pointer and that a function pointer for an interpreted method is a tagged `InterpMethod`. The `InterpMethod` descriptor will need to contain a field for the `interp_entry` thunk that can be invoked by compiled code in order to begin executing this method with interpreter. Both representations would need to be handled by JIT and interpreter compilation engines. | ||
|
||
On the interpreter side of things, `ldftn` and `ldvirtftn` opcodes would resolve the method to be loaded and then look it up in the AOT images. If the method is found, then we would load the address of the methods code, otherwise we would load a tagged `InterpMethod` descriptor. `calli` would check if the function pointer is tagged or not. If it is tagged, it will untag it and proceed with the normal interpreter call invocation. If it is not tagged, then it will obtain the appropriate transition thunk (for on the signature embedded in the `calli` instruction) and it will call it passing the compiled code pointer together with the pointer to the arguments on the interpreter stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the same tagged pointer approach be used to dispatch virtual methods? ie the virtual method gets resolved to a code pointer that may be a tagged pointer, and then we treat it as an indirect call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good idea that could help with reusing the same virtual dispatch machinery between interp/aot.
docs/design/interpreter/calls.md
Outdated
|
||
A direct calls is a call where the method to be called is known during method compilation time, for example for static or non-virtual calls. In this situation, when the call is compiled, an `InterpMethod` is allocated for the target method that needs to be called and this pointer is embedded into the generated interpreter code. No additional work is needed at execution, the `InterpMethod` is fetched from the opcode stream, for the first call the method will have to be compiled and then call dispatch continues as described above. | ||
|
||
In order to account for the scenario where the method to be called is aot compiled, when emitting code during compilation, we would first check if the method is present in an aot image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to enumerate the different types of transition wrappers that we expect to exist in the AOT image to support interaction of interpreted and AOT compiled code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description of transition wrappers is included in the compiled-code-interop.md
document. There are mainly two types of wrappers, each handling a type of signature:
- interp exit: for interp->aot or pinvoke calls from interp. Conservatively, we should generate these in the aot image for all signatures of aot compiled methods and for pinvoke signatures
- interp entry: for aot->interp or unmanaged callers only. Conservatively, we should generate these in the aot image for all indirect call signatures that are encountered in aot-ed code and unmanaged callers only signatures. This wrapper will also need dynamically generated thunks/fat pointer mechanism for embedding additional argument.
It's plausible that we would encounter scenarios where a required wrapper is missing and I've describe alternatives, but I don't think it is something that should be prioritized in this stage. Also I imagine there could be minor differences between interoping with aot and interoping with foreign code via pinvokes. I'm not convinced whether this would actually require separate wrappers.
docs/design/interpreter/calls.md
Outdated
|
||
A direct calls is a call where the method to be called is known during method compilation time, for example for static or non-virtual calls. In this situation, when the call is compiled, an `InterpMethod` is allocated for the target method that needs to be called and this pointer is embedded into the generated interpreter code. No additional work is needed at execution, the `InterpMethod` is fetched from the opcode stream, for the first call the method will have to be compiled and then call dispatch continues as described above. | ||
|
||
In order to account for the scenario where the method to be called is aot compiled, when emitting code during compilation, we would first check if the method is present in an aot image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CoreCLR has been moving towards similar kind of transition wrappers to support MethodInfo.Invoke. They are dynamically generated and JITed today. It would be nice to share the transition wrappers between reflection and interpreter.
docs/design/interpreter/calls.md
Outdated
|
||
A direct calls is a call where the method to be called is known during method compilation time, for example for static or non-virtual calls. In this situation, when the call is compiled, an `InterpMethod` is allocated for the target method that needs to be called and this pointer is embedded into the generated interpreter code. No additional work is needed at execution, the `InterpMethod` is fetched from the opcode stream, for the first call the method will have to be compiled and then call dispatch continues as described above. | ||
|
||
In order to account for the scenario where the method to be called is aot compiled, when emitting code during compilation, we would first check if the method is present in an aot image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to account for the scenario where the method to be called is aot compiled, when emitting code during compilation, we would first check if the method is present in an aot image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call. | |
In order to account for the scenario where the method to be called is AOT compiled, when emitting code during compilation, we would first check if the method is present in an AOT image. If that's the case, we would emit a different opcode instead and embed in the code the pointer to be called. During execution, this opcode would fetch the transition wrapper and proceed with the call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throughout.
|
||
### Calls from compiled code | ||
|
||
Whenever compiled code needs to call a method, unless the call is direct, it will at some point query the runtime for the address of the method code. If at any point the runtime fails to resolve the method code, it should fallback to generating/obtaining a transition wrapper for the method in question. The caller will have no need to know whether it is calling compiled code or whether it is calling into the interpreter, the calling convention will be identical. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is the interpreter mode, I think I'd like to have a requirement imposed here and say the thunk must be pre compiled. I don't see the obvious benefit to generating anything on the fly. I'd like to see some of these cases narrowly defined with asserts rather than functionality that we aren't testing or using regularly.
Do we have a scenario where we need an unmanaged entry point and don't have an AOT image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that just a current implementation detail? Is it needed for CoreCLR's interpreter? For iOS we can utilize the remapping trick so generating an actual callable native function pointer is relatively simple. For WASM, can we impose a requirement that there be some AOT requirement for reverse P/Invoke thunks or we generate a finite set of entries and when that is exhausted, failure?
It is possible I am missing something here, but I'd really like to avoid generating scenarios are complete for completeness sake. Inevitably we create solutions that bit rot. This is top of mine for me because we have some many features that aren't used and dictate how we currently design and build new features. If we need them and they have consistent testing, let's do it, otherwise we shouldn't be creating features that will inevitably atrophy because they aren't needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've touched this subject in the compiled-code-interop
document. So ideally we would have precompiled wrappers for all necessary signatures. Let's say aot code does an indirect call. During compilation we can conservatively assume that this call might end up in the interpreter so we would generate an interp_entry
wrapper for this signature. However, I think there can be cases where we might not know exactly all the required signatures. Let's say this call has some generic arguments that maybe aren't easy to determine during app compilation (?? I don't fully understand the generic sharing story when valuetypes are included). Or rather we just observe that generating wrappers for every signature is incredibly wasteful with app size and we try to generate less of them at the cost of sometimes not having an immediately available wrapper. I described a potential fallback in the document I mentioned, where we would emit some low level IR to be able to handle any kind of transition. I loosely referred to this approach as "generating code", but it doesn't actually mean emitting executable code at runtime since this is assumed impossible.
However, on a separate note, I strongly think it would be good to be able to compile these wrappers at run time with JIT, in case it is not too difficult. If you have a bug that you are sure is interpreter related, it might be a massive pain to have to run all sorts of build tools like assembly scanning, aot all wrappers etc. Just writting a console sample and run the assembly with corerun
would greatly speed up development on the interpreter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is reasonable to require some sort of AOT image for Wasm that does imply some sort of AOT pass for both build and publish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lewing wouldn't that mean we'd need to have the wasm toolchain available all the time?
Included also initial document about runtime APIs used by the mono interpreter.