Adding padding to precode #111819

mikelle-rogers · 2025-01-24T23:18:25Z

Add padding to precode to align the StubPrecode with the offset of the data.
We have a single constant “StubPrecode::CodeSize” that is 24 in this case. This address is used to help access the the size of the code that gets copied from the template code. The effect is that instead of having 12 bytes of StubPrecode code followed by e.g. 12 zeros, we have 12 bytes of StubPrecode code followed by 12 bytes from the beginning of the FixupPrecodeCode.
That leads to the misdetection of stubStartAddress - FixupPrecode::FixupCodeOffset as FixupPrecode, because there in fact is part of its code.

dotnet-policy-service · 2025-01-24T23:18:55Z

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

jkotas · 2025-01-24T23:28:15Z

We try to keep these precodes as small as possible. We allocate a lot of them. Is this change going to be measurable memory consumption regression?

janvorli · 2025-01-25T08:32:40Z

We try to keep these precodes as small as possible. We allocate a lot of them. Is this change going to be measurable memory consumption regression?

@jkotas there is no change in the size. The precode size is a max(precode data, precode code). The data size was what we pad the code to in this change. This fix fixes the problem when due to the code being smaller, the remaining code part after the real precode code was filled by code from the next block of code, which caused misclassification of the StubPrecode as FixupPrecode in PrecodeStubManager::DoTraceStub.

noahfalk · 2025-01-25T08:39:48Z

The precode size is a max(precode data, precode code). The data size was what we pad the code to in this change.

It does suggest that in the future we might be able to improve the perf by more efficiently packing the code but I'm happy to separate optional future perf improvements from the current PR which fixes a bug and incurs no perf regression.

janvorli · 2025-01-25T08:49:25Z

@mikelle-rogers could you please make the same change for x86, x64 and arm too so that we are consistent? The riscv64 and loongarch64 have the code sizes larger than the data size, so there is no need to change those. You can find the FixupPrecode sizes here:

runtime/src/coreclr/vm/precode.h

Lines 226 to 252 in f91ff5e

    
           struct FixupPrecode 
        
           { 
        
           #if defined(TARGET_AMD64) 
        
               static const int Type = 0xFF; 
        
               static const SIZE_T CodeSize = 24; 
        
               static const int FixupCodeOffset = 6; 
        
           #elif defined(TARGET_X86) 
        
               static const int Type = 0xFF; 
        
               static const SIZE_T CodeSize = 24; 
        
               static const int FixupCodeOffset = 6; 
        
           #elif defined(TARGET_ARM64) 
        
               static const int Type = 0x0B; 
        
               static const SIZE_T CodeSize = 24; 
        
               static const int FixupCodeOffset = 8; 
        
           #elif defined(TARGET_ARM) 
        
               static const int Type = 0xCF; 
        
               static const SIZE_T CodeSize = 12; 
        
               static const int FixupCodeOffset = 4 + THUMB_CODE; 
        
           #elif defined(TARGET_LOONGARCH64) 
        
               static const int Type = 0x3; 
        
               static const SIZE_T CodeSize = 32; 
        
               static const int FixupCodeOffset = 12; 
        
           #elif defined(TARGET_RISCV64) 
        
               static const int Type = 0x97; 
        
               static const SIZE_T CodeSize = 32; 
        
               static const int FixupCodeOffset = 10; 
        
           #endif // TARGET_AMD64

And the StubPrecode sizes here:

runtime/src/coreclr/vm/precode.h

Lines 94 to 112 in f91ff5e

    
           #if defined(TARGET_AMD64) 
        
               static const BYTE Type = 0x4C; 
        
               static const SIZE_T CodeSize = 24; 
        
           #elif defined(TARGET_X86) 
        
               static const BYTE Type = 0xA1; 
        
               static const SIZE_T CodeSize = 24; 
        
           #elif defined(TARGET_ARM64) 
        
               static const int Type = 0x4A; 
        
               static const SIZE_T CodeSize = 24; 
        
           #elif defined(TARGET_ARM) 
        
               static const int Type = 0xFF; 
        
               static const SIZE_T CodeSize = 12; 
        
           #elif defined(TARGET_LOONGARCH64) 
        
               static const int Type = 0x4; 
        
               static const SIZE_T CodeSize = 24; 
        
           #elif defined(TARGET_RISCV64) 
        
               static const int Type = 0x17; 
        
               static const SIZE_T CodeSize = 24; 
        
           #endif // TARGET_AMD64

It would be also good to change the following asserts:

runtime/src/coreclr/vm/precode.cpp

Line 422 in f91ff5e

    
           _ASSERTE((SIZE_T)((BYTE*)StubPrecodeCode##size##_End - (BYTE*)StubPrecodeCode##size) <= StubPrecode::CodeSize); \

and

runtime/src/coreclr/vm/precode.cpp

Line 434 in f91ff5e

    
           _ASSERTE((SIZE_T)((BYTE*)StubPrecodeCode_End - (BYTE*)StubPrecodeCode) <= StubPrecode::CodeSize);

To use == instead of <=. There would make sure the code fills in the precode "slot".

janvorli · 2025-01-25T09:42:01Z

It does suggest that in the future we might be able to improve the perf by more efficiently packing the code

@noahfalk I was considering that when I have implemented these stubs during the W^X work, but that would actually lead to a waste of RW memory unless the size of the data was an integer multiple of the (possibly padded) code size. Currently, we interleave code and data pages in memory 1:1. If the data size was e.g. 1.5 times the code size, then we would need 2 data pages for 1 code page, but the 2nd page would always be half empty.
While the arm64 actually has the data size twice the code size, so we would still be able to efficiently pack it so that 1 code page was followed by 2 code pages, this is not the case for other architectures.
For example, for x64, the StubPrecode size is 13 bytes, FixupPrecode size is 19 bytes and the data size for each is 24 bytes, for arm32 StubPrecode size is 8 bytes, FixupPrecode size is 12 bytes and the data size for each is 12 bytes.
Also, having the code / data pages ratio different for each architecture would complicate the interleaved heap management (the interleaved heap is the heap that allocates memory for these precodes).
It would also mean that each of the precode on the page would have a different relative offset to their data slot. So instead of a single template of the stub, we would need to have the whole page of stubs. In fact, we don't interleave 1 code page with 1 data pages, but we use a sequence of code pages followed by a sequence of data pages. So we would need to have that many pages of the stub precompiled in coreclr.

Regarding the StubPrecode, there actually is a different opportunity to reduce the size. The StubPrecode data has an extra pointer sized slot for a type, because it is used both for StubPrecode and NDirectImportPrecode. I think I've made that slot pointer sized to ensure the code start is always aligned on the pointer size for perf reasons, but I am not sure. We could experiment here to see if making it smaller would work fine perf wise or not.

noahfalk · 2025-01-25T23:28:07Z

Thanks for more background into @janvorli! I didn't mean to imply that optimizing it further would be easy or that anyone should prioritize doing it, only that in theory it seemed possible. I agree with you that it would certainly add complexity and probably involve some other tradeoffs. I'm happy to leave it to you and others working in that area to decide if and when such an optimization would be worthwhile.

adding padding to precode

d41d802

dotnet-issue-labeler bot added the area-VM-coreclr label Jan 24, 2025

dotnet-policy-service bot assigned mikelle-rogers Jan 24, 2025

jkotas requested a review from janvorli January 24, 2025 23:25

mikelle-rogers requested a review from a team January 24, 2025 23:59

noahfalk approved these changes Jan 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding padding to precode #111819

Adding padding to precode #111819

mikelle-rogers commented Jan 24, 2025 •

edited

Loading

dotnet-policy-service bot commented Jan 24, 2025

jkotas commented Jan 24, 2025

janvorli commented Jan 25, 2025

noahfalk commented Jan 25, 2025

janvorli commented Jan 25, 2025

janvorli commented Jan 25, 2025

noahfalk commented Jan 25, 2025

Adding padding to precode #111819

Are you sure you want to change the base?

Adding padding to precode #111819

Conversation

mikelle-rogers commented Jan 24, 2025 • edited Loading

dotnet-policy-service bot commented Jan 24, 2025

jkotas commented Jan 24, 2025

janvorli commented Jan 25, 2025

noahfalk commented Jan 25, 2025

janvorli commented Jan 25, 2025

janvorli commented Jan 25, 2025

noahfalk commented Jan 25, 2025

mikelle-rogers commented Jan 24, 2025 •

edited

Loading