Add logging and metrics for slow HTTP requests (#1540)

In Cappuccino, we observed poor performance while DA nodes were syncing, due to a missing index on the payload hash field causing Postgres queries to be slow. I had previously seen occasional CPU spikes and slow requests from running the nasty client, so ~~I suspect this issue would have been caught earlier, before deploying to production, if these types of performance issues uncovered by the nasty client were easier to diagnose~~ (turns out we weren't doing queries by payload hash at all). This PR aims to make it so by improving visibility into slow requests (even those that don't actually time out) via logging and metrics. ### This PR: * Adds queries by payload hash * Adds a new nasty client parameter to warn, but not error, if requests are too slow. This can be set fairly aggressively to catch even subtle performance errors, like 1s. * Adds three new metrics: * Slow request threshold, which helps to interpret the slow request counter * Counter of slow requests * Histogram of request latencies ### This PR does not: Fix any performance issues ### Key places to review: `nasty_client.rs` ### How to test this PR: Run `just demo-native`. Navigate to http://localhost:24011/status/metrics. Observe the `http_slow_requests` counter and the `http_request_latency` histogram.
EspressoSystems · Jun 7, 2024 · 5f2c5bf · 5f2c5bf
2 parents ed1350a + be24ee7
commit 5f2c5bf
Show file tree

Hide file tree

Showing 3 changed files with 166 additions and 116 deletions.
diff --git a/.env b/.env
@@ -119,9 +119,6 @@ ESPRESSO_SEQUENCER_FETCH_RATE_LIMIT=25
 # Query service stress test
 ESPRESSO_NASTY_CLIENT_PORT=24011
 
-# Query service stress test
-ESPRESSO_NASTY_CLIENT_PORT=24011
-
 # Openzeppelin Defender Deployment Profile
 DEFENDER_KEY=
 DEFENDER_SECRET=

diff --git a/sequencer/api/public-env-vars.toml b/sequencer/api/public-env-vars.toml
@@ -24,7 +24,8 @@ variables = [
     "ESPRESSO_COMMITMENT_TASK_PORT",
     "ESPRESSO_COMMITMENT_TASK_REQUEST_TIMEOUT",
     "ESPRESSO_DEPLOYER_OUT_PATH",
-    "ESPRESSO_NASTY_CLIENT_HTTP_TIMEOUT",
+    "ESPRESSO_NASTY_CLIENT_HTTP_TIMEOUT_ERROR",
+    "ESPRESSO_NASTY_CLIENT_HTTP_TIMEOUT_WARNING",
     "ESPRESSO_NASTY_CLIENT_MAX_BLOCKING_POLLS",
     "ESPRESSO_NASTY_CLIENT_MAX_OPEN_STREAMS",
     "ESPRESSO_NASTY_CLIENT_MAX_RETRIES",