Aeron benchmarks are open source and can be found here [1]. I was running both AWS and GCP tests and wrote the benchmarks.
The particular transport benchmark mentioned here is an echo test where a message is sent between two machines and echoed back to the sender. This is a single threaded test using a single stream (flow) between publisher and subscriber. On each box there is one application thread that sends and receives data and a standalone media driver component running in a DEDICATED mode (i.e. with 3 separate threads: conductor/sender/receiver).
AWS limits single flow traffic [2]. This test was using cluster placement group placement policy which has a nominal limit of 10 Gbps. However, this is true only for TCP. For UDP the actual limit is 8 Gbps when CPG is used (this is not documented anywhere).
Aeron adds a 32 byte header to each message so 288 bytes payload becomes 320 bytes on the network. At a 3M msgs/sec rate Aeron was sending data at 7.68 Gbps (which is 96% of 8 Gbps limit) on a single CPU core. At that rate it was still achieving p99 < 1ms latency target.
We chose `c5n.9xlarge` instance for this test, because it reserved an entire CPU socket to a single VM. This was done to avoid interference from other VMs, i.e. busy neighbour problem.
GCP test was done on `c3-highcpu-88` instance type. Again choosing an instance with so many cores was done to avoid sharing CPU socket with other VMs.
Aeron can easily saturate a 10 GbE NIC even without kernel bypass (given proper configuration). However, this is not a very useful test. Much harder problem is sending small/medium sized messages at high rates and handling bursts of data.
Aeron transport was designed to achieve both low and predictable latency and high throughput at the same time. The two are not at odds with each other.
Thank you for your response. I retract my concerns. The protocol seems adequately performant with the new information you have provided.
The benchmark is clearly artificially bottlenecking on (non-disclosed by the vendor) I/O limits and being provided excess compute for stability/"target deployment" reasons and is thus not indicative of the actual protocol compute bottleneck.
It might be beneficial to include these details in the documentation so that your benchmarks do not appear to show much worse performance to a casual reader who does not know the internal structure of the benchmarked system. That or present a benchmark that is not artificially bottlenecked (or show compute load of the bottlenecked implementation) to demonstrate the actual performance limits of the protocol.
The particular transport benchmark mentioned here is an echo test where a message is sent between two machines and echoed back to the sender. This is a single threaded test using a single stream (flow) between publisher and subscriber. On each box there is one application thread that sends and receives data and a standalone media driver component running in a DEDICATED mode (i.e. with 3 separate threads: conductor/sender/receiver).
AWS limits single flow traffic [2]. This test was using cluster placement group placement policy which has a nominal limit of 10 Gbps. However, this is true only for TCP. For UDP the actual limit is 8 Gbps when CPG is used (this is not documented anywhere).
Aeron adds a 32 byte header to each message so 288 bytes payload becomes 320 bytes on the network. At a 3M msgs/sec rate Aeron was sending data at 7.68 Gbps (which is 96% of 8 Gbps limit) on a single CPU core. At that rate it was still achieving p99 < 1ms latency target.
We chose `c5n.9xlarge` instance for this test, because it reserved an entire CPU socket to a single VM. This was done to avoid interference from other VMs, i.e. busy neighbour problem.
GCP test was done on `c3-highcpu-88` instance type. Again choosing an instance with so many cores was done to avoid sharing CPU socket with other VMs.
Aeron can easily saturate a 10 GbE NIC even without kernel bypass (given proper configuration). However, this is not a very useful test. Much harder problem is sending small/medium sized messages at high rates and handling bursts of data.
Aeron transport was designed to achieve both low and predictable latency and high throughput at the same time. The two are not at odds with each other.
[1] https://github.com/aeron-io/benchmarks
[2] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-inst...