Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Avx10.2 Instructions in Floating Point Conversions #111775

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
9ba454d
Add support for AVX10.2. Add AVX10.2 API surface and template tests. …
khushal1996 Dec 13, 2024
6aa4048
Add support and template tests for AVX10v2_V512
khushal1996 Dec 16, 2024
b0f4e6c
Add new coredistools.dll build from latest llvm repo
khushal1996 Nov 5, 2024
8304105
Limit JIT unit suite within the subsets which are stable in SDE.
Ruihan-Yin Aug 2, 2024
64e328a
Rename API as per latest API proposal discussions
khushal1996 Dec 16, 2024
08c7c26
fix sample tests in handwritten project
khushal1996 Dec 16, 2024
ef3101c
Revert "Limit JIT unit suite within the subsets which are stable in S…
khushal1996 Dec 16, 2024
b4de426
Limit JIT unit suite within the subsets which are stable in SDE.
Ruihan-Yin Aug 2, 2024
a2aba38
Allow a prefix of 0x00 for AVX10.2 instructions.
khushal1996 Dec 17, 2024
abac88e
Revert "Limit JIT unit suite within the subsets which are stable in S…
khushal1996 Dec 18, 2024
154988b
Limit JIT unit suite within the subsets which are stable in SDE.
Ruihan-Yin Aug 2, 2024
47f3e5a
remove developer comments from files
khushal1996 Dec 18, 2024
e6004f5
Enable all template tests and enable ymm embedded rounding
khushal1996 Dec 20, 2024
ae223f8
Make emitter independent of ISa and based on insOpts for ymm embedded…
khushal1996 Dec 25, 2024
885f1cb
Enable ymm embedded rounding based on architecture
khushal1996 Jan 7, 2025
12a5a26
Revert "Make emitter independent of ISa and based on insOpts for ymm …
khushal1996 Jan 7, 2025
161c3e9
Separate Avx10.2 unit testing framework from APX framework
khushal1996 Jan 7, 2025
5de4944
Revert "Limit JIT unit suite within the subsets which are stable in S…
khushal1996 Jan 8, 2025
2a9b3f8
Revert "Add new coredistools.dll build from latest llvm repo"
khushal1996 Jan 8, 2025
83868ab
Fix formatting
khushal1996 Jan 8, 2025
ca860a3
Use new keyword for class V512 to hide Avx10v1.V512 and correct CI er…
khushal1996 Jan 9, 2025
3082a84
Merge branch 'main' into kcm-avx102-api-public-pr
khushal1996 Jan 10, 2025
11c495a
Remove MinMax APis from lowering for numargs=2
khushal1996 Jan 10, 2025
87aca5f
Add docstrings for APIs
khushal1996 Jan 10, 2025
c26c67f
revert changes for sde execution of tests
khushal1996 Jan 13, 2025
38eeeff
Add appropriate comments from reviews
khushal1996 Jan 13, 2025
eee0f88
Apply suggestions from code review
khushal1996 Jan 13, 2025
e8bdd11
Merge branch 'kcm-avx102-api-public-pr' of https://github.com/khushal…
khushal1996 Jan 13, 2025
a5d8d95
Add emitter tests for XMM9/16 to make sure special handling does not …
khushal1996 Jan 21, 2025
e6e0f4b
Format code
khushal1996 Jan 21, 2025
abad2f7
Handle sizePrefix = 0 case when decoding evex instruction
khushal1996 Jan 21, 2025
a1e7cb4
Add assert in appropriate places
khushal1996 Jan 21, 2025
c88ee06
Club similar instructions together in perf calculation in emitxarch
khushal1996 Jan 21, 2025
a53b88d
Run formatting
khushal1996 Jan 21, 2025
e9ae4e2
Add assembly prints for debug assembly capturing for Avx10.2
khushal1996 Jan 22, 2025
be2abc0
Use correct size when running emitter tests
khushal1996 Jan 23, 2025
e8ff022
Ad appropriate comments and make review changes
khushal1996 Jan 23, 2025
3503794
Use AVX10.2 instructions in conversions
khushal1996 Jan 16, 2025
3c4f509
Merge branch 'main' into kcm-avx102-opt1
khushal1996 Jan 24, 2025
a171487
Run formatting
khushal1996 Jan 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/coreclr/jit/codegenxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7430,7 +7430,7 @@ void CodeGen::genFloatToIntCast(GenTree* treeNode)
noway_assert((dstSize == EA_ATTR(genTypeSize(TYP_INT))) || (dstSize == EA_ATTR(genTypeSize(TYP_LONG))));

// We shouldn't be seeing uint64 here as it should have been converted
// into a helper call by either front-end or lowering phase, unless we have AVX512F
// into a helper call by either front-end or lowering phase, unless we have AVX512F/AVX10.2
// accelerated conversions.
assert(!varTypeIsUnsigned(dstType) || (dstSize != EA_ATTR(genTypeSize(TYP_LONG))) ||
compiler->canUseEvexEncodingDebugOnly());
Expand Down
8 changes: 8 additions & 0 deletions src/coreclr/jit/emitxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12316,6 +12316,14 @@ void emitter::emitDispIns(
case INS_vcvttsd2usi64:
case INS_vcvttss2usi32:
case INS_vcvttss2usi64:
case INS_vcvttsd2sis32:
case INS_vcvttsd2sis64:
case INS_vcvttss2sis32:
case INS_vcvttss2sis64:
case INS_vcvttsd2usis32:
case INS_vcvttsd2usis64:
case INS_vcvttss2usis32:
case INS_vcvttss2usis64:
{
assert(!id->idIsEvexAaaContextSet());
printf("%s, %s", emitRegName(id->idReg1(), attr), emitRegName(id->idReg2(), EA_16BYTE));
Expand Down
34 changes: 33 additions & 1 deletion src/coreclr/jit/gentree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21630,7 +21630,39 @@ GenTree* Compiler::gtNewSimdCvtNode(var_types type,
GenTree* fixupVal;
bool isV512Supported = false;

if (compIsEvexOpportunisticallySupported(isV512Supported))
if (compOpportunisticallyDependsOn(InstructionSet_AVX10v2))
{
NamedIntrinsic cvtIntrinsic = NI_Illegal;
switch (simdTargetBaseType)
{
case TYP_INT:
cvtIntrinsic = (simdSize == 64) ? NI_AVX10v2_V512_ConvertToVectorInt32WithTruncationSaturation
: NI_AVX10v2_ConvertToVectorInt32WithTruncationSaturation;
break;

case TYP_UINT:
cvtIntrinsic = (simdSize == 64) ? NI_AVX10v2_V512_ConvertToVectorUInt32WithTruncationSaturation
: NI_AVX10v2_ConvertToVectorUInt32WithTruncationSaturation;
break;

case TYP_LONG:
cvtIntrinsic = (simdSize == 64) ? NI_AVX10v2_V512_ConvertToVectorInt64WithTruncationSaturation
: NI_AVX10v2_ConvertToVectorInt64WithTruncationSaturation;
break;

case TYP_ULONG:
cvtIntrinsic = (simdSize == 64) ? NI_AVX10v2_V512_ConvertToVectorUInt64WithTruncationSaturation
: NI_AVX10v2_ConvertToVectorUInt64WithTruncationSaturation;
break;

default:
{
unreached();
}
}
return gtNewSimdHWIntrinsicNode(type, op1, cvtIntrinsic, simdSourceBaseJitType, simdSize);
}
else if (compIsEvexOpportunisticallySupported(isV512Supported))
{
/*Generate the control table for VFIXUPIMMSD/SS
- For conversion to unsigned
Expand Down
106 changes: 76 additions & 30 deletions src/coreclr/jit/instr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2463,42 +2463,88 @@ instruction CodeGen::ins_FloatConv(var_types to, var_types from, emitAttr attr)
break;

case TYP_FLOAT:
switch (to)
if (compiler->compOpportunisticallyDependsOn(InstructionSet_AVX10v2))
{
case TYP_INT:
return INS_cvttss2si32;
case TYP_LONG:
return INS_cvttss2si64;
case TYP_FLOAT:
return ins_Move_Extend(TYP_FLOAT, false);
case TYP_DOUBLE:
return INS_cvtss2sd;
case TYP_ULONG:
return INS_vcvttss2usi64;
case TYP_UINT:
return INS_vcvttss2usi32;
default:
unreached();
switch (to)
{
case TYP_INT:
return INS_vcvttss2sis32;
case TYP_LONG:
return INS_vcvttss2sis64;
case TYP_FLOAT:
return ins_Move_Extend(TYP_FLOAT, false);
case TYP_DOUBLE:
return INS_cvtss2sd;
case TYP_ULONG:
return INS_vcvttss2usis64;
case TYP_UINT:
return INS_vcvttss2usis32;
default:
unreached();
}
}
else
{
switch (to)
{
case TYP_INT:
return INS_cvttss2si32;
case TYP_LONG:
return INS_cvttss2si64;
case TYP_FLOAT:
return ins_Move_Extend(TYP_FLOAT, false);
case TYP_DOUBLE:
return INS_cvtss2sd;
case TYP_ULONG:
return INS_vcvttss2usi64;
case TYP_UINT:
return INS_vcvttss2usi32;
default:
unreached();
}
}
break;

case TYP_DOUBLE:
switch (to)
if (compiler->compOpportunisticallyDependsOn(InstructionSet_AVX10v2))
{
case TYP_INT:
return INS_cvttsd2si32;
case TYP_LONG:
return INS_cvttsd2si64;
case TYP_FLOAT:
return INS_cvtsd2ss;
case TYP_DOUBLE:
return ins_Move_Extend(TYP_DOUBLE, false);
case TYP_ULONG:
return INS_vcvttsd2usi64;
case TYP_UINT:
return INS_vcvttsd2usi32;
default:
unreached();
switch (to)
{
case TYP_INT:
return INS_vcvttsd2sis32;
case TYP_LONG:
return INS_vcvttsd2sis64;
case TYP_FLOAT:
return INS_cvtsd2ss;
case TYP_DOUBLE:
return ins_Move_Extend(TYP_DOUBLE, false);
case TYP_ULONG:
return INS_vcvttsd2usis64;
case TYP_UINT:
return INS_vcvttsd2usis32;
default:
unreached();
}
}
else
{
switch (to)
{
case TYP_INT:
return INS_cvttsd2si32;
case TYP_LONG:
return INS_cvttsd2si64;
case TYP_FLOAT:
return INS_cvtsd2ss;
case TYP_DOUBLE:
return ins_Move_Extend(TYP_DOUBLE, false);
case TYP_ULONG:
return INS_vcvttsd2usi64;
case TYP_UINT:
return INS_vcvttsd2usi32;
default:
unreached();
}
}
break;

Expand Down
6 changes: 3 additions & 3 deletions src/coreclr/jit/lowerxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -869,7 +869,9 @@ GenTree* Lowering::LowerCast(GenTree* tree)

#if defined(TARGET_AMD64)
// Handle saturation logic for X64
if (varTypeIsFloating(srcType) && varTypeIsIntegral(dstType) && !varTypeIsSmall(dstType))
// Let InstructionSet_AVX10v2 pass through since it can handle the saturation
if (varTypeIsFloating(srcType) && varTypeIsIntegral(dstType) && !varTypeIsSmall(dstType) &&
!comp->compOpportunisticallyDependsOn(InstructionSet_AVX10v2))
{
// We should have filtered out float -> long conversion and
// converted it to float -> double -> long conversion.
Expand All @@ -886,10 +888,8 @@ GenTree* Lowering::LowerCast(GenTree* tree)
bool isV512Supported = false;
/*The code below is to introduce saturating conversions on X86/X64.
The C# equivalence of the code is given below -->

// Replace QNaN and SNaN with Zero
op1 = Avx512F.Fixup(op1, op1, Vector128.Create<long>(0x88), 0);

// Convert from double to long, replacing any values that were greater than or equal to MaxValue
with MaxValue
// Values that were less than or equal to MinValue will already be MinValue
Expand Down
7 changes: 5 additions & 2 deletions src/coreclr/jit/morph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -337,8 +337,9 @@ GenTree* Compiler::fgMorphExpandCast(GenTreeCast* tree)
// dstType = int for SSE41
// For pre-SSE41, the all src is converted to TYP_DOUBLE
// and goes through helpers.
&& (tree->gtOverflow() || (dstType == TYP_LONG) ||
!(canUseEvexEncoding() || (dstType == TYP_INT && compOpportunisticallyDependsOn(InstructionSet_SSE41))))
&&
(tree->gtOverflow() || (dstType == TYP_LONG && !compOpportunisticallyDependsOn(InstructionSet_AVX10v2)) ||
!(canUseEvexEncoding() || (dstType == TYP_INT && compOpportunisticallyDependsOn(InstructionSet_SSE41))))
#elif defined(TARGET_ARM)
// Arm: src = float, dst = int64/uint64 or overflow conversion.
&& (tree->gtOverflow() || varTypeIsLong(dstType))
Expand Down Expand Up @@ -372,6 +373,8 @@ GenTree* Compiler::fgMorphExpandCast(GenTreeCast* tree)
#else
#if defined(TARGET_AMD64)
// Following nodes are handled when lowering the nodes
// float -> ulong/uint/int/long fro AVX10.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// float -> ulong/uint/int/long fro AVX10.2
// float -> ulong/uint/int/long for AVX10.2

// double -> ulong/uint/int/long for AVX10.2
// float -> ulong/uint/int for AVX512F
// double -> ulong/uint/long/int for AVX512F
// float -> int for SSE41
Expand Down
Loading