Skip to content

Commit

Permalink
perf(jetsocat): increase the maximum JMUX message size
Browse files Browse the repository at this point in the history
This has almost no effect on the throughput when there is a significant
delay, but the throughput is improved when the delay is very small to
non-inexistant. The main benefit is a reduced CPU-usage.

1. Benchmark results before this patch

a. With 50ms delay on loopback

1 connection:
[  1] 0.0000-600.4197 sec  16.1 GBytes   230 Mbits/sec

2 connections:
[  1] 0.0000-605.0387 sec  8.19 GBytes   116 Mbits/sec
[  2] 0.0000-605.1395 sec  8.19 GBytes   116 Mbits/sec
[SUM] 0.0000-605.1395 sec  16.4 GBytes   233 Mbits/sec

10 connections:
[  3] 0.0000-625.7966 sec  1.69 GBytes  23.2 Mbits/sec
[  8] 0.0000-625.9956 sec  1.69 GBytes  23.2 Mbits/sec
[  1] 0.0000-626.0966 sec  1.69 GBytes  23.2 Mbits/sec
[  5] 0.0000-626.0964 sec  1.69 GBytes  23.2 Mbits/sec
[  2] 0.0000-626.1983 sec  1.69 GBytes  23.2 Mbits/sec
[  7] 0.0000-626.1964 sec  1.69 GBytes  23.2 Mbits/sec
[  6] 0.0000-626.1964 sec  1.69 GBytes  23.2 Mbits/sec
[  9] 0.0000-626.1981 sec  1.69 GBytes  23.2 Mbits/sec
[ 10] 0.0000-626.2973 sec  1.69 GBytes  23.2 Mbits/sec
[  4] 0.0000-626.3984 sec  1.69 GBytes  23.2 Mbits/sec
[SUM] 0.0000-626.3986 sec  16.9 GBytes   232 Mbits/sec

b. Without delay

1 connection:
[  1] 0.0000-600.0518 sec  1.33 TBytes  19.4 Gbits/sec

2 connections:
[  2] 0.0000-600.0706 sec   681 GBytes  9.75 Gbits/sec
[  1] 0.0000-600.0705 sec   681 GBytes  9.75 Gbits/sec
[SUM] 0.0000-600.0705 sec  1.33 TBytes  19.5 Gbits/sec

10 connections:
[  3] 0.0000-600.3608 sec   112 GBytes  1.60 Gbits/sec
[  5] 0.0000-600.3606 sec   112 GBytes  1.60 Gbits/sec
[  6] 0.0000-600.3605 sec   112 GBytes  1.60 Gbits/sec
[  8] 0.0000-600.3598 sec   112 GBytes  1.60 Gbits/sec
[  7] 0.0000-600.3594 sec   112 GBytes  1.60 Gbits/sec
[  1] 0.0000-600.3606 sec   112 GBytes  1.60 Gbits/sec
[  9] 0.0000-600.3597 sec   112 GBytes  1.60 Gbits/sec
[ 10] 0.0000-600.3606 sec   112 GBytes  1.60 Gbits/sec
[  2] 0.0000-600.3602 sec   112 GBytes  1.60 Gbits/sec
[  4] 0.0000-600.3719 sec   112 GBytes  1.60 Gbits/sec
[SUM] 0.0000-600.3721 sec  1.09 TBytes  16.0 Gbits/sec

2. Benchmark results after this patch

a. With 50ms delay on loopback

1 connection:
[  1] 0.0000-600.4552 sec  16.1 GBytes   231 Mbits/sec

2 connections:
[  1] 0.0000-605.1600 sec  8.16 GBytes   116 Mbits/sec
[  2] 0.0000-605.1599 sec  8.16 GBytes   116 Mbits/sec
[SUM] 0.0000-605.1599 sec  16.3 GBytes   232 Mbits/sec

10 connections:
[  8] 0.0000-625.8346 sec  1.69 GBytes  23.2 Mbits/sec
[  9] 0.0000-626.1828 sec  1.69 GBytes  23.2 Mbits/sec
[  2] 0.0000-626.1820 sec  1.69 GBytes  23.2 Mbits/sec
[  5] 0.0000-626.1817 sec  1.69 GBytes  23.2 Mbits/sec
[  6] 0.0000-626.1815 sec  1.69 GBytes  23.2 Mbits/sec
[  4] 0.0000-626.1827 sec  1.69 GBytes  23.2 Mbits/sec
[  3] 0.0000-626.1814 sec  1.69 GBytes  23.2 Mbits/sec
[  7] 0.0000-626.1821 sec  1.69 GBytes  23.2 Mbits/sec
[  1] 0.0000-626.2831 sec  1.69 GBytes  23.1 Mbits/sec
[ 10] 0.0000-626.2819 sec  1.69 GBytes  23.1 Mbits/sec
[SUM] 0.0000-626.2832 sec  16.9 GBytes   232 Mbits/sec

b. Without delay

1 connection:
[  1] 0.0000-600.0402 sec  1.68 TBytes  24.6 Gbits/sec

2 connections:
[  1] 0.0000-600.0628 sec   752 GBytes  10.8 Gbits/sec
[  2] 0.0000-601.0794 sec   751 GBytes  10.7 Gbits/sec
[SUM] 0.0000-601.0794 sec  1.47 TBytes  21.5 Gbits/sec

10 connections:
[  6] 0.0000-600.3015 sec   127 GBytes  1.82 Gbits/sec
[  3] 0.0000-600.3014 sec   127 GBytes  1.82 Gbits/sec
[  7] 0.0000-600.3012 sec   127 GBytes  1.82 Gbits/sec
[  5] 0.0000-600.2992 sec   127 GBytes  1.82 Gbits/sec
[  9] 0.0000-600.3014 sec   127 GBytes  1.82 Gbits/sec
[  1] 0.0000-600.3006 sec   127 GBytes  1.82 Gbits/sec
[  2] 0.0000-600.3601 sec   127 GBytes  1.82 Gbits/sec
[ 10] 0.0000-600.3592 sec   127 GBytes  1.82 Gbits/sec
[  8] 0.0000-600.3604 sec   127 GBytes  1.82 Gbits/sec
[  4] 0.0000-600.3586 sec   127 GBytes  1.82 Gbits/sec
[SUM] 0.0000-600.3605 sec  1.24 TBytes  18.2 Gbits/sec
  • Loading branch information
CBenoit committed Aug 18, 2024
1 parent 73f1716 commit 2ac9808
Show file tree
Hide file tree
Showing 2 changed files with 88 additions and 7 deletions.
9 changes: 7 additions & 2 deletions crates/jmux-proxy/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ use tokio::task::JoinHandle;
use tokio_util::codec::FramedRead;
use tracing::{Instrument as _, Span};

const MAXIMUM_PACKET_SIZE_IN_BYTES: u16 = 4 * 1024; // 4 kiB
const DATA_PACKET_OVERHEAD: u16 = 8;
const MAXIMUM_PACKET_SIZE_IN_BYTES: u16 = 8 * 1024 + DATA_PACKET_OVERHEAD; // 8 kiB + packet overhead
const WINDOW_ADJUSTMENT_THRESHOLD: u32 = 4 * 1024; // 4 kiB

pub type ApiResponseSender = oneshot::Sender<JmuxApiResponse>;
Expand Down Expand Up @@ -777,7 +778,11 @@ impl DataReaderTask {
} = self;

let codec = tokio_util::codec::BytesCodec::new();
let mut bytes_stream = FramedRead::new(reader, codec);
let mut bytes_stream = FramedRead::with_capacity(
reader,
codec,
usize::from(MAXIMUM_PACKET_SIZE_IN_BYTES - DATA_PACKET_OVERHEAD),
);
let maximum_packet_size = usize::from(maximum_packet_size);

trace!("Started forwarding");
Expand Down
86 changes: 81 additions & 5 deletions docs/JMUX-proxy-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,10 +70,9 @@ iperf -c "$ADDR" -p $PORT -P 10 -t 600

Let’s assume the script is in a file named `run_iperf.sh`.

Running `iperf` for long enough is important to ensure that the buffering happening at the socket level is not influencing the numbers too much.
When running less a minute, we end up measuring the rate at which `iperf` enqueue bytes into the socket’s buffer.
Filling the buffer can be done very quickly and can have a significant impact on the measured average speed.
10 minutes is long enough to obtain convergent results.
It's important to note that `iperf` should be run for an extended period to account for the initial filling of TCP socket buffers,
which can artificially inflate the average throughput if tested for less than a minute.
Running `iperf` for 10 minutes is enough to ensure the results accurately reflect the effective average throughput.

## Applied optimizations

Expand Down Expand Up @@ -274,7 +273,7 @@ The flow control algorithm, particularly the window size, is a critical paramete
Since such delays are common in almost all practical setups, it’s safe to say that this is the most important metric to optimize.

Other optimizations, while beneficial, primarily serve to reduce CPU usage and increase throughput on very high-speed networks.
A speed of 30 Mbits/s is already considered high, but networks with throughput exceeding 1 Gbits/s also exist.
A speed of 30 Mbits/s is already considered high, but networks with throughput exceeding 1 Gbits/s also exist (e.g.: ultra-high speed local area networks).
Enhancing performance for these networks is valuable, particularly in reducing CPU usage as the volume of data processed increases.

Measurements indicate that our JMUX proxy should perform well, even on high-speed networks.
Expand All @@ -286,3 +285,80 @@ In real-world wide-area networks, packet loss will inevitably occur.

Nevertheless, these results provide valuable data, confirming that our optimizations are effective with a high degree of confidence.
While further optimization could be pursued to address more specific scenarios, the current implementation is likely sufficient for most practical purposes.

## 2025.2.0 update

Related patches:

- <https://github.com/Devolutions/devolutions-gateway/pull/974>
- <https://github.com/Devolutions/devolutions-gateway/pull/979>

### Results

```shell
./run_iperf.sh 5000
```

#### With 50ms delay on loopback

1 connection:

```
[ 1] 0.0000-600.4552 sec 16.1 GBytes 231 Mbits/sec
```

2 connections:

```
[ 1] 0.0000-605.1600 sec 8.16 GBytes 116 Mbits/sec
[ 2] 0.0000-605.1599 sec 8.16 GBytes 116 Mbits/sec
[SUM] 0.0000-605.1599 sec 16.3 GBytes 232 Mbits/sec
```

10 connections:

```
[ 8] 0.0000-625.8346 sec 1.69 GBytes 23.2 Mbits/sec
[ 9] 0.0000-626.1828 sec 1.69 GBytes 23.2 Mbits/sec
[ 2] 0.0000-626.1820 sec 1.69 GBytes 23.2 Mbits/sec
[ 5] 0.0000-626.1817 sec 1.69 GBytes 23.2 Mbits/sec
[ 6] 0.0000-626.1815 sec 1.69 GBytes 23.2 Mbits/sec
[ 4] 0.0000-626.1827 sec 1.69 GBytes 23.2 Mbits/sec
[ 3] 0.0000-626.1814 sec 1.69 GBytes 23.2 Mbits/sec
[ 7] 0.0000-626.1821 sec 1.69 GBytes 23.2 Mbits/sec
[ 1] 0.0000-626.2831 sec 1.69 GBytes 23.1 Mbits/sec
[ 10] 0.0000-626.2819 sec 1.69 GBytes 23.1 Mbits/sec
[SUM] 0.0000-626.2832 sec 16.9 GBytes 232 Mbits/sec
```

#### Without delay

1 connection:

```
[ 1] 0.0000-600.0402 sec 1.68 TBytes 24.6 Gbits/sec
```

2 connections:

```
[ 1] 0.0000-600.0628 sec 752 GBytes 10.8 Gbits/sec
[ 2] 0.0000-601.0794 sec 751 GBytes 10.7 Gbits/sec
[SUM] 0.0000-601.0794 sec 1.47 TBytes 21.5 Gbits/sec
```

10 connections:

```
[ 6] 0.0000-600.3015 sec 127 GBytes 1.82 Gbits/sec
[ 3] 0.0000-600.3014 sec 127 GBytes 1.82 Gbits/sec
[ 7] 0.0000-600.3012 sec 127 GBytes 1.82 Gbits/sec
[ 5] 0.0000-600.2992 sec 127 GBytes 1.82 Gbits/sec
[ 9] 0.0000-600.3014 sec 127 GBytes 1.82 Gbits/sec
[ 1] 0.0000-600.3006 sec 127 GBytes 1.82 Gbits/sec
[ 2] 0.0000-600.3601 sec 127 GBytes 1.82 Gbits/sec
[ 10] 0.0000-600.3592 sec 127 GBytes 1.82 Gbits/sec
[ 8] 0.0000-600.3604 sec 127 GBytes 1.82 Gbits/sec
[ 4] 0.0000-600.3586 sec 127 GBytes 1.82 Gbits/sec
[SUM] 0.0000-600.3605 sec 1.24 TBytes 18.2 Gbits/sec
```

0 comments on commit 2ac9808

Please sign in to comment.