Skip to content

Commit

Permalink
Add logging and metrics for slow HTTP requests (#1540)
Browse files Browse the repository at this point in the history
In Cappuccino, we observed poor performance while DA nodes were syncing,
due to a missing index on the payload hash field causing Postgres
queries to be slow. I had previously seen occasional CPU spikes and slow
requests from running the nasty client, so ~~I suspect this issue would
have been caught earlier, before deploying to production, if these types
of performance issues uncovered by the nasty client were easier to
diagnose~~ (turns out we weren't doing queries by payload hash at all).
This PR aims to make it so by improving visibility into slow requests
(even those that don't actually time out) via logging and metrics.

### This PR:
* Adds queries by payload hash
* Adds a new nasty client parameter to warn, but not error, if requests
are too slow. This can be set fairly aggressively to catch even subtle
performance errors, like 1s.
* Adds three new metrics:
* Slow request threshold, which helps to interpret the slow request
counter
  * Counter of slow requests
  * Histogram of request latencies

### This PR does not:

Fix any performance issues

### Key places to review:
`nasty_client.rs`

### How to test this PR: 
Run `just demo-native`. Navigate to
http://localhost:24011/status/metrics. Observe the `http_slow_requests`
counter and the `http_request_latency` histogram.
<!-- ### Things tested -->
<!-- Anything that was manually tested (that is not tested in CI). -->
<!-- E.g. building/running of docker containers. Changes to docker demo,
... -->
<!-- Especially mention anything untested, with reasoning and link an
issue to resolve this. -->

<!-- Complete the following items before creating this PR -->
<!-- [ ] Issue linked or PR description mentions why this change is
necessary. -->
<!-- [ ] PR description is clear enough for reviewers. -->
<!-- [ ] Documentation for changes (additions) has been updated (added).
-->
<!-- [ ] If this is a draft it is marked as "draft".  -->

<!-- To make changes to this template edit
https://github.com/EspressoSystems/.github/blob/main/PULL_REQUEST_TEMPLATE.md
-->
  • Loading branch information
jbearer authored Jun 7, 2024
2 parents ed1350a + be24ee7 commit 5f2c5bf
Show file tree
Hide file tree
Showing 3 changed files with 166 additions and 116 deletions.
3 changes: 0 additions & 3 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -119,9 +119,6 @@ ESPRESSO_SEQUENCER_FETCH_RATE_LIMIT=25
# Query service stress test
ESPRESSO_NASTY_CLIENT_PORT=24011

# Query service stress test
ESPRESSO_NASTY_CLIENT_PORT=24011

# Openzeppelin Defender Deployment Profile
DEFENDER_KEY=
DEFENDER_SECRET=
Expand Down
3 changes: 2 additions & 1 deletion sequencer/api/public-env-vars.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@ variables = [
"ESPRESSO_COMMITMENT_TASK_PORT",
"ESPRESSO_COMMITMENT_TASK_REQUEST_TIMEOUT",
"ESPRESSO_DEPLOYER_OUT_PATH",
"ESPRESSO_NASTY_CLIENT_HTTP_TIMEOUT",
"ESPRESSO_NASTY_CLIENT_HTTP_TIMEOUT_ERROR",
"ESPRESSO_NASTY_CLIENT_HTTP_TIMEOUT_WARNING",
"ESPRESSO_NASTY_CLIENT_MAX_BLOCKING_POLLS",
"ESPRESSO_NASTY_CLIENT_MAX_OPEN_STREAMS",
"ESPRESSO_NASTY_CLIENT_MAX_RETRIES",
Expand Down
Loading

0 comments on commit 5f2c5bf

Please sign in to comment.