Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding padding to precode #111819

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mikelle-rogers
Copy link
Member

@mikelle-rogers mikelle-rogers commented Jan 24, 2025

Add padding to precode to align the StubPrecode with the offset of the data.
We have a single constant “StubPrecode::CodeSize” that is 24 in this case. This address is used to help access the the size of the code that gets copied from the template code. The effect is that instead of having 12 bytes of StubPrecode code followed by e.g. 12 zeros, we have 12 bytes of StubPrecode code followed by 12 bytes from the beginning of the FixupPrecodeCode.
That leads to the misdetection of stubStartAddress - FixupPrecode::FixupCodeOffset as FixupPrecode, because there in fact is part of its code.

Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@jkotas jkotas requested a review from janvorli January 24, 2025 23:25
@jkotas
Copy link
Member

jkotas commented Jan 24, 2025

We try to keep these precodes as small as possible. We allocate a lot of them. Is this change going to be measurable memory consumption regression?

@mikelle-rogers mikelle-rogers requested a review from a team January 24, 2025 23:59
@janvorli
Copy link
Member

We try to keep these precodes as small as possible. We allocate a lot of them. Is this change going to be measurable memory consumption regression?

@jkotas there is no change in the size. The precode size is a max(precode data, precode code). The data size was what we pad the code to in this change. This fix fixes the problem when due to the code being smaller, the remaining code part after the real precode code was filled by code from the next block of code, which caused misclassification of the StubPrecode as FixupPrecode in PrecodeStubManager::DoTraceStub.

@noahfalk
Copy link
Member

The precode size is a max(precode data, precode code). The data size was what we pad the code to in this change.

It does suggest that in the future we might be able to improve the perf by more efficiently packing the code but I'm happy to separate optional future perf improvements from the current PR which fixes a bug and incurs no perf regression.

@janvorli
Copy link
Member

@mikelle-rogers could you please make the same change for x86, x64 and arm too so that we are consistent? The riscv64 and loongarch64 have the code sizes larger than the data size, so there is no need to change those. You can find the FixupPrecode sizes here:

struct FixupPrecode
{
#if defined(TARGET_AMD64)
static const int Type = 0xFF;
static const SIZE_T CodeSize = 24;
static const int FixupCodeOffset = 6;
#elif defined(TARGET_X86)
static const int Type = 0xFF;
static const SIZE_T CodeSize = 24;
static const int FixupCodeOffset = 6;
#elif defined(TARGET_ARM64)
static const int Type = 0x0B;
static const SIZE_T CodeSize = 24;
static const int FixupCodeOffset = 8;
#elif defined(TARGET_ARM)
static const int Type = 0xCF;
static const SIZE_T CodeSize = 12;
static const int FixupCodeOffset = 4 + THUMB_CODE;
#elif defined(TARGET_LOONGARCH64)
static const int Type = 0x3;
static const SIZE_T CodeSize = 32;
static const int FixupCodeOffset = 12;
#elif defined(TARGET_RISCV64)
static const int Type = 0x97;
static const SIZE_T CodeSize = 32;
static const int FixupCodeOffset = 10;
#endif // TARGET_AMD64

And the StubPrecode sizes here:
#if defined(TARGET_AMD64)
static const BYTE Type = 0x4C;
static const SIZE_T CodeSize = 24;
#elif defined(TARGET_X86)
static const BYTE Type = 0xA1;
static const SIZE_T CodeSize = 24;
#elif defined(TARGET_ARM64)
static const int Type = 0x4A;
static const SIZE_T CodeSize = 24;
#elif defined(TARGET_ARM)
static const int Type = 0xFF;
static const SIZE_T CodeSize = 12;
#elif defined(TARGET_LOONGARCH64)
static const int Type = 0x4;
static const SIZE_T CodeSize = 24;
#elif defined(TARGET_RISCV64)
static const int Type = 0x17;
static const SIZE_T CodeSize = 24;
#endif // TARGET_AMD64

It would be also good to change the following asserts:

_ASSERTE((SIZE_T)((BYTE*)StubPrecodeCode##size##_End - (BYTE*)StubPrecodeCode##size) <= StubPrecode::CodeSize); \

and
_ASSERTE((SIZE_T)((BYTE*)StubPrecodeCode_End - (BYTE*)StubPrecodeCode) <= StubPrecode::CodeSize);

To use == instead of <=. There would make sure the code fills in the precode "slot".

@janvorli
Copy link
Member

It does suggest that in the future we might be able to improve the perf by more efficiently packing the code

@noahfalk I was considering that when I have implemented these stubs during the W^X work, but that would actually lead to a waste of RW memory unless the size of the data was an integer multiple of the (possibly padded) code size. Currently, we interleave code and data pages in memory 1:1. If the data size was e.g. 1.5 times the code size, then we would need 2 data pages for 1 code page, but the 2nd page would always be half empty.
While the arm64 actually has the data size twice the code size, so we would still be able to efficiently pack it so that 1 code page was followed by 2 code pages, this is not the case for other architectures.
For example, for x64, the StubPrecode size is 13 bytes, FixupPrecode size is 19 bytes and the data size for each is 24 bytes, for arm32 StubPrecode size is 8 bytes, FixupPrecode size is 12 bytes and the data size for each is 12 bytes.
Also, having the code / data pages ratio different for each architecture would complicate the interleaved heap management (the interleaved heap is the heap that allocates memory for these precodes).
It would also mean that each of the precode on the page would have a different relative offset to their data slot. So instead of a single template of the stub, we would need to have the whole page of stubs. In fact, we don't interleave 1 code page with 1 data pages, but we use a sequence of code pages followed by a sequence of data pages. So we would need to have that many pages of the stub precompiled in coreclr.

Regarding the StubPrecode, there actually is a different opportunity to reduce the size. The StubPrecode data has an extra pointer sized slot for a type, because it is used both for StubPrecode and NDirectImportPrecode. I think I've made that slot pointer sized to ensure the code start is always aligned on the pointer size for perf reasons, but I am not sure. We could experiment here to see if making it smaller would work fine perf wise or not.

@noahfalk
Copy link
Member

Thanks for more background into @janvorli! I didn't mean to imply that optimizing it further would be easy or that anyone should prioritize doing it, only that in theory it seemed possible. I agree with you that it would certainly add complexity and probably involve some other tradeoffs. I'm happy to leave it to you and others working in that area to decide if and when such an optimization would be worthwhile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants