Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable AVX-512 for string/span Equals/StartsWith #84885

Merged
merged 4 commits into from
Apr 17, 2023

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Apr 15, 2023

Contributes to #77034

A small clean up in impExpandHalfConstEqualsSIMD to make it SIMD size agnostic. At some point I plan to unify this with LowerCallMemcmp but currently they're different: this one has OrdinalIngoreCase support and produces better codegen when data is half constant (not just length). LowerCallMemcmp will aslo be removed because we need a new phase where we can rely on SSA/VN to keep all existing optimizations.

Benchmark:

string _testData = // field
    "https://github.com/dotnet/runtime/blob/main/CONTRIBUTING.md?r=0";

[Benchmark]
public bool Test_IgnoreCase() => _testData.StartsWith(
    "https://github.com/dotnet/runtime/blob/main/CONTRIBUTING.md", 
    StringComparison.OrdinalIgnoreCase);

[Benchmark]
public bool Test() => _testData.StartsWith(
    "https://github.com/dotnet/runtime/blob/main/CONTRIBUTING.md",
    StringComparison.Ordinal);
Method Toolchain Mean Ratio
Test_IgnoreCase \runtime-PR\corerun.exe 1.1675 ns 1.00
Test_IgnoreCase \runtime-base\corerun.exe 8.5046 ns 7.28
Test \runtime-PR\corerun.exe 0.7078 ns 1.00
Test \runtime-base\corerun.exe 2.9632 ns 4.19

(Ryzen 7950x, win11-x64)

codegen diff for Test_IgnoreCase():

Was:

; Method Prog:Test_IgnoreCase():bool:this
       mov      rcx, gword ptr [rcx+08H]
       mov      rdx, 0x276002095A8      ; 'https://github.com/dotnet/runtime/blob/main/CONTRIBUTING.md'
       mov      r8d, 5
       cmp      dword ptr [rcx], ecx
       tail.jmp [System.String:StartsWith(System.String,int):bool:this] ;; not unrolled, size is too big for AVX1
; Total bytes of code: 28

New codgen:

; Method Prog:Test_IgnoreCase():bool:this
       vzeroupper 
       mov      rax, gword ptr [rcx+08H]
       cmp      dword ptr [rax+08H], 59
       jl       SHORT G_M52390_IG04
       vmovups  zmm0, zmmword ptr [rax+0CH]
       vporq    zmm0, zmm0, zmmword ptr [reloc @RWD00]
       vpxorq   zmm0, zmm0, zmmword ptr [reloc @RWD64]
       vmovups  zmm1, zmmword ptr [rax+42H]
       vporq    zmm1, zmm1, zmmword ptr [reloc @RWD128]
       vpxorq   zmm1, zmm1, zmmword ptr [reloc @RWD192]
       vporq    zmm0, zmm0, zmm1
       vxorps   zmm1, zmm1, zmm1
       vpcmpuq  k1, zmm0, zmm1, 0
       kortestb k1, k1
       setb     al
       movzx    rax, al
       jmp      SHORT G_M52390_IG05
G_M52390_IG04:  
       xor      eax, eax
G_M52390_IG05:  
       vzeroupper 
       ret      
RWD00  	dq	0020002000200020h, 0000000000000020h, 0020002000200020h, 0020000000200020h, 0020000000200020h, 0020002000200020h, 0020002000000020h, 0020002000200020h
RWD64  	dq	0070007400740068h, 002F002F003A0073h, 0068007400690067h, 0063002E00620075h, 0064002F006D006Fh, 0065006E0074006Fh, 00750072002F0074h, 006D00690074006Eh
RWD128 	dq	0020002000200020h, 0020000000200020h, 0000002000200020h, 0020002000200020h, 0020002000200000h, 0020002000200020h, 0020002000200020h, 0020002000000020h
RWD192 	dq	00690074006E0075h, 0062002F0065006Dh, 002F0062006F006Ch, 006E00690061006Dh, 006E006F0063002Fh, 0062006900720074h, 006E006900740075h, 0064006D002E0067h
; Total bytes of code: 110

AVX-512 is used for [32..64] length range (in utf16)

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 15, 2023
@ghost ghost assigned EgorBo Apr 15, 2023
@ghost
Copy link

ghost commented Apr 15, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Contributes to #77034

A small clean up in impExpandHalfConstEqualsSIMD to make it SIMD size agnostic. At some point I plan to unify this with LowerCallMemcmp but currently they're different: this one has OrdinalIngoreCase support and produces better codegen when data is half constant (not just length). LowerCallMemcmp will aslo be removed because we need a new phase where we can rely on SSA/VN to keep all existing optimizations.

Benchmark:

private string TestData => 
    "https://github.com/dotnet/runtime/blob/main/CONTRIBUTING.md?r=0";

[Benchmark]
public bool Test() => TestData.StartsWith(
    "https://github.com/dotnet/runtime/blob/main/CONTRIBUTING.md", 
    StringComparison.OrdinalIgnoreCase);

codegen diff for Test(): https://www.diffchecker.com/5n4oF6gn/

Method Toolchain Mean Ratio
Test \runtime-base\corerun.exe 0.9315 ns 1.00
Test \runtime-PR\corerun.exe 8.7793 ns 9.50

10x faster (Ryzen 7950x, win11-x64)

Author: EgorBo
Assignees: EgorBo
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo EgorBo added the avx512 Related to the AVX-512 architecture label Apr 15, 2023
@EgorBo EgorBo mentioned this pull request Apr 15, 2023
56 tasks
@EgorBo
Copy link
Member Author

EgorBo commented Apr 15, 2023

/benchmark json aspnet-citrine-win runtime

@pr-benchmarks
Copy link

pr-benchmarks bot commented Apr 15, 2023

Benchmark started for json on aspnet-citrine-win with runtime. Logs: link

@EgorBo
Copy link
Member Author

EgorBo commented Apr 15, 2023

@tannergooding @BruceForstall @dotnet/avx512-contrib PTAL, no impact on TE, SPMI diffs are empty due to missing contexts - lots of missing contexts mean it found quite a few places for avx512 path, jit-utils found +16kb diff on BCL.

@tannergooding
Copy link
Member

This will also hit #84912, but @EgorBo said he plans on fixing both issues at once in a single PR, that will help avoid additional CI churn and mixing dissimilar changes in.

@EgorBo EgorBo merged commit 223d152 into dotnet:main Apr 17, 2023
@EgorBo EgorBo deleted the string-unroll-avx512 branch April 17, 2023 18:40
@ghost ghost locked as resolved and limited conversation to collaborators May 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants