Add logging and metrics for slow HTTP requests #1540

jbearer · 2024-06-03T18:10:26Z

In Cappuccino, we observed poor performance while DA nodes were syncing, due to a missing index on the payload hash field causing Postgres queries to be slow. I had previously seen occasional CPU spikes and slow requests from running the nasty client, so ~~I suspect this issue would have been caught earlier, before deploying to production, if these types of performance issues uncovered by the nasty client were easier to diagnose~~ (turns out we weren't doing queries by payload hash at all). This PR aims to make it so by improving visibility into slow requests (even those that don't actually time out) via logging and metrics.

This PR:

Adds queries by payload hash
Adds a new nasty client parameter to warn, but not error, if requests are too slow. This can be set fairly aggressively to catch even subtle performance errors, like 1s.
Adds three new metrics:
- Slow request threshold, which helps to interpret the slow request counter
- Counter of slow requests
- Histogram of request latencies

This PR does not:

Fix any performance issues

Key places to review:

nasty_client.rs

How to test this PR:

Run just demo-native. Navigate to http://localhost:24011/status/metrics. Observe the http_slow_requests counter and the http_request_latency histogram.

imabdulbasit approved these changes Jun 7, 2024

View reviewed changes

Base automatically changed from jb/nasty-client to main June 7, 2024 14:30

jbearer added 2 commits June 7, 2024 10:34

Add logging and metrics for slow HTTP requests

a634213

Add queries by payload hash

be24ee7

jbearer force-pushed the jb/nasty-client-slow-requests branch from b9fc39c to be24ee7 Compare June 7, 2024 14:34

jbearer marked this pull request as ready for review June 7, 2024 14:34

jbearer requested review from nomaxg, philippecamacho, ImJeremyHe, sveitser and tbro as code owners June 7, 2024 14:34

jbearer enabled auto-merge June 7, 2024 14:34

jbearer merged commit 5f2c5bf into main Jun 7, 2024
13 checks passed

jbearer deleted the jb/nasty-client-slow-requests branch June 7, 2024 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add logging and metrics for slow HTTP requests #1540

Add logging and metrics for slow HTTP requests #1540

jbearer commented Jun 3, 2024 •

edited

Loading

Add logging and metrics for slow HTTP requests #1540

Add logging and metrics for slow HTTP requests #1540

Conversation

jbearer commented Jun 3, 2024 • edited Loading

This PR:

This PR does not:

Key places to review:

How to test this PR:

jbearer commented Jun 3, 2024 •

edited

Loading