Memory profiler #1201

alv-around · 2025-01-10T11:50:39Z

Here is a minimal implementation of memory profiler for the sdk (#1029) using the official prometheus-client (see also here).

Currently the PR does:

the profiler runs under the memory_profiler feature and provides a breakdown text of the memory consumption of each method once the sdk is dropped. bellow the test sample of running cargo run --example sdk_app --features memory_profiler :

❯ cargo run --example sdk_app --features memory_profiler
   Compiling sysinfo v0.33.1
   Compiling openvm-sdk v0.2.0-alpha (/home/alv/code/openvm/crates/sdk)
    Finished `dev` profile [optimized + debuginfo] target(s) in 1m 02s
     Running `/home/alv/code/openvm/target/debug/examples/sdk_app`
cargo:rerun-if-env-changed=OPENVM_SKIP_BUILD
Using rustc: /home/alv/.rustup/toolchains/nightly-2024-10-30-x86_64-unknown-linux-gnu/bin/rustc
Building guest package: cargo +nightly-2024-10-30 build --target riscv32im-risc0-zkvm-elf -Z build-std=alloc,core,proc_macro,panic_abort,std -Z build-std-features=compiler-builtins-mem
cargo command: cargo +nightly-2024-10-30 build --target riscv32im-risc0-zkvm-elf -Z build-std=alloc,core,proc_macro,panic_abort,std -Z build-std-features=compiler-builtins-mem --manifest-path /home/alv/code/openvm/crates/sdk/guest/Cargo.toml --target-dir /home/alv/code/openvm/crates/sdk/guest/target --profile release
openvm-sdk-example-test: Starting build for riscv32im-risc0-zkvm-elf
openvm-sdk-example-test:     Finished `release` profile [optimized] target(s) in 0.61s
public values output: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
# HELP memory usage memory used in process.
# TYPE memory usage gauge
memory usage{method="Build"} 6440038400
memory usage{method="Transpile"} 6436044800
memory usage{method="Commit"} 6471786496
memory usage{method="KeyGen"} 6456745984
memory usage{method="AggVerify"} 7193219072
memory usage{method="Execute"} 6435389440
memory usage{method="Proof"} 6766968832
# EOF

For the moment, I put the profiler behind the memory_profiler feature, so I don't need to touch existing code. I wanted to avoid making any change to the sdk interface, without getting more feedback.

Possible adjustments to the code:

Create macro to take the measurements instead of calling in the middle of the method
Change extend the interface of Sdk struct
Other ways to display the information to the user
Make registry thread-safe (??)
...

These I some ideas that I had, but I wanted to share them here before jumping into the implementation. Any other ideas not cover here are more than welcome.

EDIT:
Ok, I just saw this tip, I may come back and swap prometheus-client for metrics-process 🫠

jonathanpwang · 2025-01-10T19:48:19Z

@alv-around very exciting! I'm not sure about prometheus-client vs metrics-process. I suggested the latter just because that's what I saw reth used.

I haven't looked into the PR yet, but I guess there are two different approaches:

either you get profile info at specific points in time
you have a separate async loop that periodically gets profile snapshots, and then you hook that up to a metrics recorder to either display or aggregate the info

It sounds like you currently went with (1)? It'd be nice if we could also support (2), or maybe feature flag between them somehow. I'll have more concrete suggestions after I look at the code.

also tagging @luffykai to take a look

jonathanpwang · 2025-01-10T19:50:25Z

crates/sdk/src/lib.rs

@@ -148,6 +178,8 @@ impl Sdk {
        VC::Periphery: Chip<SC>,
    {
        let app_prover = AppProver::new(app_pk.app_vm_pk.clone(), app_committed_exe);
+        #[cfg(feature = "memory_profiler")]
+        self.profiler.update_memory_usage(Method::Proof);


the issue is this only gets the memory usage at this point in time, but I think isn't able to capture peak memory usage during the overall function span?

jonathanpwang

Amazing progress so far!

Adding to my other comments after taking a skim: I think if possible I'd like something where: the memory profiling is a separate loop/thread, we collect metrics via recorder and the different sdk functions are scoped by labels (see https://github.com/openvm-org/openvm/blob/main/docs/crates/metrics.md), and then in some separate process/post-process in openvm-prof (crates/prof) we can set up some analyzer or grafana dashboard that then let's us put more concrete numbers to different workloads.

alv-around · 2025-01-13T18:35:14Z

@alv-around very exciting! I'm not sure about prometheus-client vs metrics-process. I suggested the latter just because that's what I saw reth used.

I haven't looked into the PR yet, but I guess there are two different approaches:
1. either you get profile info at specific points in time

2. you have a separate async loop that periodically gets profile snapshots, and then you hook that up to a metrics recorder to either display or aggregate the info
It sounds like you currently went with (1)? It'd be nice if we could also support (2), or maybe feature flag between them somehow. I'll have more concrete suggestions after I look at the code.

also tagging @luffykai to take a look

@jonathanpwang yes! completely agree with your feedback. the last time I checked the issue was last year, and I was completely unaware of the process-metrics. The process-metrics way and your proposal, makes much more sense :)

Just a final notice, I would go for threads over async loops, as proving will block most likely the metrics measurement.

alv-around · 2025-01-14T17:42:19Z

@alv-around very exciting! I'm not sure about prometheus-client vs metrics-process. I suggested the latter just because that's what I saw reth used.

I haven't looked into the PR yet, but I guess there are two different approaches:
1. either you get profile info at specific points in time

2. you have a separate async loop that periodically gets profile snapshots, and then you hook that up to a metrics recorder to either display or aggregate the info
It sounds like you currently went with (1)? It'd be nice if we could also support (2), or maybe feature flag between them somehow. I'll have more concrete suggestions after I look at the code.

also tagging @luffykai to take a look

@jonathanpwang yes! completely agree with your feedback. the last time I checked the issue was last year, and I was completely unaware of the process-metrics. The process-metrics way and your proposal, makes much more sense :)

Just a final notice, I would go for threads over async loops, as proving will block most likely the metrics measurement.

alv-around · 2025-01-22T20:58:23Z

Hi @jonathanpwang, here is a small status update:

So I haven't got to sit much on it, and I struggle a bit in adding the metrics_tracing_context and to render anything apart from metric-process. I opened an issue to see if I get some additional insight on the matter.

Appart from that, currently I set up the "prometheus push gateway", basically it pushes every 10 seconds the process-metrics to a middle instances from which the "classical" can pull the data. This has the advantage thas is: lightweight and runs better with batch jobs.

The alternative, of opening an endpoint for prometheus add a lot of depencies to the create (~around 60) and it does not work very well. This alternative would work for an api that wraps around the Sdk, and if that is goal, we might be bette off adding the metrics collection to the api than to the sdk itself.

Sofar, not to happy with the outcome... Anyways, just wanted to keep you in the loop. Let me know if you have any remarks.

alv-around added 10 commits January 8, 2025 15:00

profiler wip

eff7933

simple memory profiler

e9e9fd0

add profiler to sdk struct and complete methods

954cc06

remove unused dependencies

6ecd0f3

memory_profiler feature added

038be10

undo autoformater indent

286ff97

update guest code

6aa8dfb

remove unused dependencies

4a7f225

updated Cargo.lock

0125adc

add sysinfo crate

395abf0

jonathanpwang requested review from jonathanpwang and luffykai January 10, 2025 19:48

jonathanpwang reviewed Jan 10, 2025

View reviewed changes

alv-around added 5 commits January 16, 2025 16:18

prometheus recorder

9f57a09

sdk with profiler

05c8db4

prometheus push gateway

ebbece3

update examples

4d52a19

remove changes to integration tests

6a54eb7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory profiler #1201

Memory profiler #1201

alv-around commented Jan 10, 2025 •

edited

Loading

jonathanpwang commented Jan 10, 2025

jonathanpwang Jan 10, 2025

jonathanpwang left a comment

alv-around commented Jan 13, 2025

alv-around commented Jan 14, 2025

alv-around commented Jan 22, 2025

Memory profiler #1201

Are you sure you want to change the base?

Memory profiler #1201

Conversation

alv-around commented Jan 10, 2025 • edited Loading

jonathanpwang commented Jan 10, 2025

jonathanpwang Jan 10, 2025

Choose a reason for hiding this comment

jonathanpwang left a comment

Choose a reason for hiding this comment

alv-around commented Jan 13, 2025

alv-around commented Jan 14, 2025

alv-around commented Jan 22, 2025

alv-around commented Jan 10, 2025 •

edited

Loading