Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [perf] Stability testing, out of memory exeption #5083

Open
timofeevmd opened this issue Sep 17, 2024 · 7 comments
Open

[BUG] [perf] Stability testing, out of memory exeption #5083

timofeevmd opened this issue Sep 17, 2024 · 7 comments
Assignees
Labels
Bug Something isn't working must-be-verifed Take into the nearest testing Performance non-functional

Comments

@timofeevmd
Copy link
Contributor

OS and Environment

linux, k8s

GIT commit hash

995da4e

Minimum working example / Steps to reproduce

java-sdk commit: 9aba55183e

reproduce command: ./mvnw gatling:test -DtargetURL=https://iroha2.test.tachi.soramitsu.co.jp -DremoteLogin= -DremotePassword= -Dintensity=800000 -DrampDuration=3600 -DmaxDuration=3601

full performance report

container logs: OOM.log.zip

Under a load of 1100 TPS, the system is able to function for only 44 minutes whitout any arrors and rejected transaction before encountering issues. (graph 1).
Following this period, a memory leak. (graph 2) .

graph 1
image-20240917-103749

graph 2
image-20240917-104103

Actual result

the product works stably

Expected result

after 45 minutes working catch "out if memory" exception

Logs

Log contents

Who can help to reproduce?

@timofeevmd

Notes

No response

@Mingela
Copy link
Contributor

Mingela commented Sep 17, 2024

@timofeevmd What kind of transactions are being sent? In case there are new entities being registered (like asset definitions, accounts, domains) those consume memory naturally so the leak have to be justified.

@Mingela
Copy link
Contributor

Mingela commented Sep 17, 2024

UPD; found out that's an asset transfer, though transactions (and blocks) itself consume memory so it's up to the team to analyze the consumption and conclude whether it's reasonable, thanks

@dima74
Copy link
Contributor

dima74 commented Sep 20, 2024

Based on my testing, size of a transaction with a single transfer asset instruction is ~1.8KB (CommittedTransaction struct). So for 1 million transactions iroha will consume ~1.8GB, for 3 million transactions ~5.6GB. This matches results in this issue.

I see two options how memory consumption can be improved:

  • Make kura drop old blocks from the memory. If we need such blocks later, we can load them from the disk (Kura cache #4954)
  • Optimize CommittedTransaction struct size (more on this later)

@dima74
Copy link
Contributor

dima74 commented Sep 20, 2024

Analysis of CommittedTransaction struct memory layout.

Key results:

  • Minimal possible size of a transaction is 1436 bytes
  • Typical size of a transaction with a single transfer asset instruction is 2028 bytes

Possible optimizations:

  • CommittedTransaction::error takes 200 bytes but is None for a successful transaction - perf: Reduce memory usage of CommittedTransaction #5089
  • Use read-only Box<[InstructionBox]> instead of Vec<InstructionBox> in Executable? Minimal capacity of a Vec is 4 elements, so for a transaction with a single instruction we have 176*3=528 unused space for a buffer
  • Something with public keys similar to lazy-decoding in perf: Make PublicKey decoding lazy inside WASM #5048? Single public key is 296 bytes, for a transfer asset transaction we have three accounts inside so this is total 888 bytes just for public keys
  • Use Arc<str> instead of Box<str> inside ChainId?
// self: 336
// heap: 1100 + instructions heap
struct CommittedTransaction {
    // self: 136
    // heap: 1100 + instructions heap
    value: SignedTransaction,
    // self: 200
    error: Option<error::TransactionRejectionReason>,
}

// self: 136
// heap: 1100 + instructions heap
struct SignedTransactionV1 {
    // self: 16
    // heap: 64
    signature: TransactionSignature,
    // self: 120
    // heap: 1036 + instructions heap
    payload: TransactionPayload,
}

// self: 120
// heap: 1036 + instructions heap
struct TransactionPayload {
    // self: 16, heap: 36 for default chain id
    chain: ChainId,
    // self: 24, heap: ~296
    authority: AccountId,
    // self: 8
    creation_time_ms: u64,
    // self: 32,
    // heap: 704 + instructions heap
    instructions: Executable,
    // self: 8
    time_to_live_ms: Option<NonZeroU64>,
    // self: 4
    nonce: Option<NonZeroU32>,
    // self: 24, heap: 0 if empty
    metadata: Metadata,
}

// self: 32
// heap: 176 * (instructions capacity) + instructions heap
//
// for a transaction with a single transfer instruction (4 is min Vec capacity):
// heap: 176 * 4 + ~592
enum Executable {
    Instructions(Vec<InstructionBox>),
    Wasm(WasmSmartContract),
}

// self: 176
enum InstructionBox {
    ...
}

// self: 96
// heap: ~592
struct Transfer<Asset, Numeric, Account> {
    // self: 56, heap: ~296
    source: AssetId,
    // self: 16
    object: Numeric,
    // self: 24, heap: ~296
    destination: AccountId,
}

// self: 56, heap: ~296
struct AssetId {
    // self: 24, heap: ~296
    account: AccountId,
    // self: 32, heap: 0 for domain and asset name <16
    definition: AssetDefinitionId,
}

// self: 24, heap: ~296 heap
struct AccountId {
    // self: 16, heap: 0 for domains with len <16
    domain: DomainId,
    // self: 8, heap: 296
    signatory: PublicKey,
}

// 296
pub enum PublicKeyInner {
    // 192
    Ed25519(ed25519::PublicKey),
    // 104
    Secp256k1(secp256k1::PublicKey),
    // 144
    BlsNormal(bls::BlsNormalPublicKey),
    // 288
    BlsSmall(bls::BlsSmallPublicKey),
}

@DCNick3
Copy link
Contributor

DCNick3 commented Sep 20, 2024

Use read-only Box<[InstructionBox]> instead of Vec in Executable? Minimal capacity of a Vec is 4 elements, so for a transaction with a single instruction we have 176*3=528 unused space for a buffer

Huh, I didn't know of such behavior. Notably, when doing vec![123], it allocates the capacity exactly, but with let mut v = Vec::new(); v.push(123); it does indeed overallocate the capacity to 4 to amortize future pushes.

(We also have a ConstVec<T> newtype around Box<[T]> to do the right thing with the scale encoding)

@dima74
Copy link
Contributor

dima74 commented Sep 20, 2024

Actually, we still have capacity=1 since we clone transactions in Sumeragi::try_create_block (this is a single peer case, not sure about multiple peers). Anyway would be nice to replace it with Box<[T]> to have explicit behaviour. Opened #5092

fn main() {
    let mut v = Vec::new();
    v.push(1);

    println!("{}", v.capacity());  // 4

    let v2 = v.clone();
    println!("{}", v2.capacity());  // 1
}

@dima74
Copy link
Contributor

dima74 commented Sep 27, 2024

With #5096 and #5103 merged, memory usage expected to be the following:

  • fixed ~200MB for last 128 blocks stored in kura in memory
  • unbounded ~260 bytes per transaction (State::transactions map)

So after 1mln transactions iroha should consume ~450MB, after 10mln transactions ~2.7GB.


State::transactions is map from HashOf<SignedTransaction> to NonZeroUsize, so entry size is 40 bytes (32+8). But this is BTreeMap, so some keys are stored multiple times in intermediate nodes. Also nodes store some data related to concread crate. This gives total ~260 bytes per transaction. Not sure how this can be optimized further.

@mversic mversic removed this from Iroha 2.0 Oct 8, 2024
@mversic mversic moved this from Work in Progress to Testing & Verification in Iroha Oct 8, 2024
@timofeevmd timofeevmd added Performance non-functional must-be-verifed Take into the nearest testing labels Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working must-be-verifed Take into the nearest testing Performance non-functional
Projects
None yet
Development

No branches or pull requests

7 participants