Skip to content

Commit

Permalink
[vm] Introduce cachable idempotent calls
Browse files Browse the repository at this point in the history
Adds a `CachableIdempotentCallInstr` that can be invoked via
`@pragma('vm:cachable-idempotent')` if the call-sites is force
optimized.

The object pool is not visited by the scavenger. So, we store the
results as unboxed integers. Consequently, only dart functions that
return integers can be cached.

Cachable idempotent calls should never be inlined. After the first
call the function not be called again.

The call itself is on a slow path to avoid register spilling on the
fast path.

TEST=vm/cc/IRTest_CachableIdempotentCall
TEST=runtime/tests/vm/dart/cachable_idempotent_test.dart

Bug: #51618
Change-Id: I612e896f27add76f57796c060157e14cc687a0fd
Cq-Include-Trybots: luci.dart.try:vm-aot-android-release-arm64c-try,vm-aot-android-release-arm_x64-try,vm-aot-asan-linux-release-x64-try,vm-aot-linux-debug-simarm_x64-try,vm-aot-linux-debug-simriscv64-try,vm-aot-mac-release-arm64-try,vm-aot-mac-release-x64-try,vm-aot-msan-linux-release-x64-try,vm-aot-obfuscate-linux-release-x64-try,vm-aot-tsan-linux-release-x64-try,vm-aot-ubsan-linux-release-x64-try,vm-aot-win-debug-arm64-try,vm-aot-win-debug-x64c-try,vm-aot-win-release-x64-try,vm-appjit-linux-debug-x64-try,vm-asan-linux-release-x64-try,vm-checked-mac-release-arm64-try,vm-eager-optimization-linux-release-ia32-try,vm-eager-optimization-linux-release-x64-try,vm-kernel-linux-debug-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-linux-debug-ia32-try,vm-linux-debug-simriscv64-try,vm-linux-debug-x64-try,vm-mac-debug-arm64-try,vm-mac-debug-x64-try,vm-msan-linux-release-x64-try,vm-reload-linux-debug-x64-try,vm-reload-rollback-linux-debug-x64-try,vm-tsan-linux-release-x64-try,vm-ubsan-linux-release-x64-try,vm-win-debug-arm64-try,vm-win-debug-x64-try,vm-win-debug-x64c-try,vm-win-release-ia32-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/301601
Reviewed-by: Ryan Macnak <[email protected]>
Reviewed-by: Martin Kustermann <[email protected]>
  • Loading branch information
dcharkes committed Oct 27, 2023
1 parent db3fddd commit 0cd55a1
Show file tree
Hide file tree
Showing 31 changed files with 779 additions and 38 deletions.
2 changes: 2 additions & 0 deletions runtime/docs/pragmas.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ These pragmas can cause unsound behavior if used incorrectly and therefore are o
| `vm:exact-result-type` | [Declaring an exact result type of a method](compiler/pragmas_recognized_by_compiler.md#providing-an-exact-result-type) |
| `vm:recognized` | [Marking this as a recognized method](compiler/pragmas_recognized_by_compiler.md#marking-recognized-methods) |
| `vm:idempotent` | Method marked with this pragma can be repeated or restarted multiple times without change to its effect. Loading, storing of memory values are examples of this, while reads and writes from file are examples of non-idempotent methods. At present, use of this pragma is limited to driving inlining of force-optimized functions. |
| `vm:cachable-idempotent` | Functions marked with this pragma will have their call site cache the return value. Not supported in ia32. Call site must have the pragma `vm:force-optimze`. |
| `vm:force-optimze` | Functions marked with this pragma will be compiled with the optimized pipeline and may not deoptimize. |

## Pragmas ignored in user code

Expand Down
236 changes: 236 additions & 0 deletions runtime/tests/vm/dart/cachable_idempotent_test.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
// Copyright (c) 2023, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.

import 'dart:ffi';

import 'package:expect/expect.dart';

void main() {
testMultipleIncrement();
reset();
testMultipleCallSites();
reset();
testManyArguments();
reset();
testNonIntArguments();
reset();
testLargeInt();
reset();
testIntArguments();
reset();
testDoubleArguments();
print('done');
}

@pragma('vm:force-optimize')
void testMultipleIncrement() {
int result = 0;
final counter = makeCounter(100000);
while (counter()) {
// We this calls with a cacheable call,
// which will lead to the counter no longer being incremented.
// Make sure to return the value, so we can see that the boxing and
// unboxing works as expected.
result = cachedIncrement(/*must be const*/ 3);
}
// Since this call site is force optimized, we should never recompile and thus
// we only ever increment the global counter once.
Expect.equals(3, result);
}

/// A global counter, except for the call sites are being cached.
///
/// Arguments passed to this function must be const.
/// Call sites should be rewritten to cache using the pool.
@pragma('vm:never-inline')
@pragma('vm:cachable-idempotent')
int cachedIncrement(int amount) {
return _globalCounter += amount;
}

int _globalCounter = 0;

void reset() {
print('reset');
_globalCounter = 0;
}

/// Helper for vm:force-optimize for loops without instance calls.
///
/// A for loop uses the `operator+` on int.
bool Function() makeCounter(int count) {
return () => count-- >= 0;
}

@pragma('vm:force-optimize')
void testMultipleCallSites() {
int result = 0;
final counter = makeCounter(10);
result = cachedIncrement(1);
while (counter()) {
result = cachedIncrement(10);
result = cachedIncrement(10);
}
result = cachedIncrement(100);
// All call sites are cached individually.
// Even if the arguments are identical.
Expect.equals(result, 121);
}

@pragma('vm:force-optimize')
void testManyArguments() {
final result = manyArguments(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
Expect.equals(55, result);
}

@pragma('vm:never-inline')
@pragma('vm:cachable-idempotent')
int manyArguments(int i1, int i2, int i3, int i4, int i5, int i6, int i7,
int i8, int i9, int i10) {
return i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + i10;
}

@pragma('vm:force-optimize')
void testNonIntArguments() {
final result = lotsOfConstArguments(
"foo",
3.0,
3,
const _MyClass(_MyClass(42)),
);

Expect.equals(37, result);
}

@pragma('vm:never-inline')
@pragma('vm:cachable-idempotent')
int lotsOfConstArguments(String s, double d, int i, _MyClass m) {
return [s, d, i, m].toString().length;
}

final class _MyClass {
final Object i;
const _MyClass(this.i);

@override
String toString() => '_MyClass($i)';
}

@pragma('vm:force-optimize')
void testLargeInt() {
final counter = makeCounter(10);
while (counter()) {
if (is64bitsArch()) {
final result1 = cachedIncrement(0x7FFFFFFFFFFFFFFF);
Expect.equals(0x7FFFFFFFFFFFFFFF, result1);
_globalCounter = 0;
final result2 = cachedIncrement(0x8000000000000000);
Expect.equals(0x8000000000000000, result2);
_globalCounter = 0;
final result3 = cachedIncrement(0xFFFFFFFFFFFFFFFF);
Expect.equals(0xFFFFFFFFFFFFFFFF, result3);
} else {
final result1 = cachedIncrement(0x7FFFFFFF);
Expect.equals(0x7FFFFFFF, result1);
_globalCounter = 0;
final result2 = cachedIncrement(0x80000000);
Expect.equals(0x80000000, result2);
_globalCounter = 0;
final result3 = cachedIncrement(0xFFFFFFFF);
Expect.equals(0xFFFFFFFF, result3);
}
}
}

bool is64bitsArch() => sizeOf<Pointer>() == 8;

@pragma('vm:force-optimize')
void testIntArguments() {
final result = lotsOfIntArguments(
1,
2,
3,
4,
5,
6,
7,
8,
);
Expect.equals(36, result);

// Do a second call with different values to prevent the argument values
// propagating to the function body in TFA.
final result2 = lotsOfIntArguments(
101,
102,
103,
104,
105,
106,
107,
108,
);
Expect.equals(836, result2);
}

@pragma('vm:never-inline')
@pragma('vm:cachable-idempotent')
int lotsOfIntArguments(
int d1,
int d2,
int d3,
int d4,
int d5,
int d6,
int d7,
int d8,
) {
print([d1, d2, d3, d4, d5, d6, d7, d8]);
return (d1 + d2 + d3 + d4 + d5 + d6 + d7 + d8).floor();
}

@pragma('vm:force-optimize')
void testDoubleArguments() {
final result = lotsOfDoubleArguments(
1.0,
2.0,
3.0,
4.0,
5.0,
6.0,
7.0,
8.0,
);
Expect.equals(36, result);

// Do a second call with different values to prevent the argument values
// propagating to the function body in TFA.
final result2 = lotsOfDoubleArguments(
101.0,
102.0,
103.0,
104.0,
105.0,
106.0,
107.0,
108.0,
);
Expect.equals(836, result2);
}

@pragma('vm:never-inline')
@pragma('vm:cachable-idempotent')
int lotsOfDoubleArguments(
double d1,
double d2,
double d3,
double d4,
double d5,
double d6,
double d7,
double d8,
) {
print([d1, d2, d3, d4, d5, d6, d7, d8]);
return (d1 + d2 + d3 + d4 + d5 + d6 + d7 + d8).floor();
}
1 change: 1 addition & 0 deletions runtime/tests/vm/vm.status
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ dart/snapshot_version_test: Skip # This test is a Dart1 test (script snapshot)
dart/stack_overflow_shared_test: Pass, Slow # Uses --shared-slow-path-triggers-gc flag.

[ $arch == ia32 ]
dart/cachable_idempotent_test: Skip # CachableIdempotent calls are not supported in ia32 because it has no object pool.
dart/disassemble_aot_test: SkipByDesign # IA32 does not support AOT.
dart/regress32597_2_test: Pass, Slow # Uses --optimization-counter-threshold=10 without a kernel service snapshot.
dart/regress38467_test: Pass, Slow # Uses --optimization-counter-threshold=10 without a kernel service snapshot.
Expand Down
3 changes: 3 additions & 0 deletions runtime/vm/app_snapshot.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3284,6 +3284,9 @@ class ObjectPoolDeserializationCluster : public DeserializationCluster {
static_cast<intptr_t>(switchable_call_miss_entry_point);
continue;
#endif // defined(DART_PRECOMPILED_RUNTIME)
case ObjectPool::SnapshotBehavior::kSetToZero:
entry.raw_value_ = 0;
continue;
default:
FATAL("Unexpected snapshot behavior: %d\n", snapshot_behavior);
}
Expand Down
27 changes: 27 additions & 0 deletions runtime/vm/compiler/assembler/assembler_arm.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1569,6 +1569,33 @@ void Assembler::LoadWordFromPoolIndex(Register rd,
}
}

void Assembler::StoreWordToPoolIndex(Register value,
intptr_t index,
Register pp,
Condition cond) {
ASSERT((pp != PP) || constant_pool_allowed());
ASSERT(value != pp);
// PP is tagged on ARM.
const int32_t offset =
target::ObjectPool::element_offset(index) - kHeapObjectTag;
int32_t offset_mask = 0;
if (Address::CanHoldLoadOffset(kFourBytes, offset, &offset_mask)) {
str(value, Address(pp, offset), cond);
} else {
int32_t offset_hi = offset & ~offset_mask; // signed
uint32_t offset_lo = offset & offset_mask; // unsigned
// Inline a simplified version of AddImmediate(rd, pp, offset_hi).
Operand o;
if (Operand::CanHold(offset_hi, &o)) {
add(TMP, pp, o, cond);
} else {
LoadImmediate(TMP, offset_hi, cond);
add(TMP, pp, Operand(TMP), cond);
}
str(value, Address(TMP, offset_lo), cond);
}
}

void Assembler::CheckCodePointer() {
#ifdef DEBUG
if (!FLAG_check_code_pointer) {
Expand Down
7 changes: 7 additions & 0 deletions runtime/vm/compiler/assembler/assembler_arm.h
Original file line number Diff line number Diff line change
Expand Up @@ -983,6 +983,13 @@ class Assembler : public AssemblerBase {
intptr_t index,
Register pp = PP,
Condition cond = AL);
// Store word to pool at the given offset.
//
// Note: clobbers TMP.
void StoreWordToPoolIndex(Register value,
intptr_t index,
Register pp = PP,
Condition cond = AL);

void LoadObject(Register rd, const Object& object, Condition cond = AL);
void LoadUniqueObject(
Expand Down
28 changes: 28 additions & 0 deletions runtime/vm/compiler/assembler/assembler_arm64.cc
Original file line number Diff line number Diff line change
Expand Up @@ -434,6 +434,34 @@ void Assembler::LoadWordFromPoolIndex(Register dst,
}
}

void Assembler::StoreWordToPoolIndex(Register src,
intptr_t index,
Register pp) {
ASSERT((pp != PP) || constant_pool_allowed());
ASSERT(src != pp);
Operand op;
// PP is _un_tagged on ARM64.
const uint32_t offset = target::ObjectPool::element_offset(index);
const uint32_t upper20 = offset & 0xfffff000;
if (Address::CanHoldOffset(offset)) {
str(src, Address(pp, offset));
} else if (Operand::CanHold(upper20, kXRegSizeInBits, &op) ==
Operand::Immediate) {
const uint32_t lower12 = offset & 0x00000fff;
ASSERT(Address::CanHoldOffset(lower12));
add(TMP, pp, op);
str(src, Address(TMP, lower12));
} else {
const uint16_t offset_low = Utils::Low16Bits(offset);
const uint16_t offset_high = Utils::High16Bits(offset);
movz(TMP, Immediate(offset_low), 0);
if (offset_high != 0) {
movk(TMP, Immediate(offset_high), 1);
}
str(src, Address(pp, TMP));
}
}

void Assembler::LoadDoubleWordFromPoolIndex(Register lower,
Register upper,
intptr_t index) {
Expand Down
5 changes: 5 additions & 0 deletions runtime/vm/compiler/assembler/assembler_arm64.h
Original file line number Diff line number Diff line change
Expand Up @@ -2173,6 +2173,11 @@ class Assembler : public AssemblerBase {
// Note: the function never clobbers TMP, TMP2 scratch registers.
void LoadWordFromPoolIndex(Register dst, intptr_t index, Register pp = PP);

// Store word to pool at the given offset.
//
// Note: clobbers TMP.
void StoreWordToPoolIndex(Register src, intptr_t index, Register pp = PP);

void LoadDoubleWordFromPoolIndex(Register lower,
Register upper,
intptr_t index);
Expand Down
11 changes: 6 additions & 5 deletions runtime/vm/compiler/assembler/assembler_base.cc
Original file line number Diff line number Diff line change
Expand Up @@ -381,11 +381,12 @@ intptr_t ObjectPoolBuilder::AddObject(
return AddObject(ObjectPoolBuilderEntry(&obj, patchable, snapshot_behavior));
}

intptr_t ObjectPoolBuilder::AddImmediate(uword imm) {
return AddObject(
ObjectPoolBuilderEntry(imm, ObjectPoolBuilderEntry::kImmediate,
ObjectPoolBuilderEntry::kNotPatchable,
ObjectPoolBuilderEntry::kSnapshotable));
intptr_t ObjectPoolBuilder::AddImmediate(
uword imm,
ObjectPoolBuilderEntry::Patchability patchable,
ObjectPoolBuilderEntry::SnapshotBehavior snapshotability) {
return AddObject(ObjectPoolBuilderEntry(
imm, ObjectPoolBuilderEntry::kImmediate, patchable, snapshotability));
}

intptr_t ObjectPoolBuilder::AddImmediate64(uint64_t imm) {
Expand Down
Loading

0 comments on commit 0cd55a1

Please sign in to comment.