Evaluation of the reception throughput for 11 MB data exchange
Index
1. Introduction
2. Test description
3. Hardware test machine specification
4. Setting up the software test environment
4.1 Vendors version
4.2 Building the libraries
4.3 Test execution
5. DDS application configuration
5.1 DataWriter/DataReader configuration
5.2 Transport
6. Test Results
1. Introduction
Running a throughput experiment consists on measuring the maximum amount of data that can traverse a system, i.e. how much data does the receiver receive per time unit.
This study aims to show a comparison of the throughput of three applications from two different DDS vendors. The first two correspond to Fast DDS configurations while the third is a Cyclone DDS implementation. The applications associated with each of the vendors are executed using each provider's own throughput tests. All tests will measure the throughput of a communication between a publisher and a subscriber running in separate processes but on the same test machine.
Since the tests measure the throughput of an inner machine communication, the Fast DDS tests use the SHM transport protocol with different publication modes (synchronous and asynchronous), while the Cyclone DDS test uses the UDP transport protocol (the only one available in this implementation).
2. Test description
Cyclone DDS
The test used for the measurement of the reception throughput are the tests implemented by Cyclone DDS. These tests can be found here. The description of these tests, according to the README.rst provided by Cyclone DDS is as follows.
The Cyclone DDS Throughput example allows the measurement of data throughput when receiving samples from a publisher.
- Design. It consists of 2 units:
- Publisher: sends samples at a specified size and rate.
- Subscriber: Receives samples and outputs statistics about throughput
- Scenario.
- payloadSize: the size of the payload in bytes.
- burstInterval: the time interval between each burst in ms.
- burstSize: the number of samples to send each burst.
- timeOut: the number of seconds the publisher should run for (0=infinite).
- partitionName: the name of the partition.
- maxCycles: the number of times to output statistics before terminating.
- pollingDelay.
- partitionName: the name of the partition.
- The publisher sends samples and allows to specify a payload size in bytes as well as whether to send data in bursts. Configurable:
- The subscriber will receive data and output the total amount received and the data-rate [B/s]. It will also indicate if any samples were received out-of-order. A maximum number of cycles can be specified. Configurable:
- Execution. Open 2 terminals.
- In the first terminal start Publisher by running publisher:
./publisher [payloadSize (bytes)] [burstInterval (ms)] [burstSize (samples)] [timeOut (seconds)] [partitionName]
defaults:
./publisher 8192 0 1 0 "Throughput example" - In the second terminal start Ping by running subscriber:
./subscriber [maxCycles (0=infinite)] [pollingDelay (ms, 0 = event based)] [partitionName]
defaults:
./subscriber 0 0 "Throughput example"
Fast DDS
The test used for the measurement of the reception throughput are the tests implemented by Fast DDS. These tests can be found here.
The following parameters configure the operation of the test when measuring data. The parameters considered are:
- Payload size: the size of the messages sent.
- Recovery time: The amount of time to rest between bursts. This is to let the middleware some time to recover.
- Burst size (demand): The number of messages to send each burst.
Fast DDS defines a set of tests with name performance.throughput.<test_name>.
Each of these tests defines a sub-experiment for this framework. The sub-experiment considered is: performance.throughput.interprocess_reliable_shm.
3. Hardware test machine specification
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Model name:Intel(R) Xeon(R) CPU E3-1230 v6 @ 3.50GHz
CPU MHz: 1000.838
CPU max MHz: 3900.0000
CPU min MHz: 800.0000
Operating System: GNU/Linux
Kernel release: 4.15.0-64-generic
Kernel version: #73-Ubuntu SMP Thu Sep 12 13:16:13 UTC 2019
net.core.rmem_default = 212992 (default socket receive buffer size)
net.core.rmem_max = 16777216 (maximum socket receive buffer size)
net.core.wmem_default = 212992 (default socket send buffer size)
net.core.wmem_max = 16777216 (maximum socket send buffer size)
net.ipv4.udp_mem = 102400 873800 16777216
net.ipv4.udp_rmem_min = 4096
net.ipv4.udp_wmem_min = 4096
4. Setting up the software test environment
4.1 Vendors version
GitHub Repostiories | URL | Commit |
Fast DDS | https://github.com/eProsima/Fast-DDS.git | 636754ab828 |
Fast CDR | https://github.com/eProsima/Fast-CDR.git | c5668b933b2 |
Foonathan Memory Vendor | https://github.com/eProsima/foonathan_memory_vendor.git | 7ca1e109c2d |
Cyclone DDS | https://github.com/eclipse-cyclonedds/cyclonedds | c261053186c |
4.2 Building the libraries
Cyclone DDS
Cyclone DDS has been built following the guidelines outlined in the README.md file of the Cyclone DDS GitHub repository. The example executables are automatically compiled with the default Cyclone DDS build.
Fast DDS
Fast DDS has been built following the guidelines outlined in the Linux installation from sources section of the library's documentation hosted in ReadTheDocs. The installation has been done using Colcon with the required CMake options for building the performance tests.
colcon build --cmake-args -DEPROSIMA_BUILD=ON -DEPROSIMA_BUILD_TESTS=OFF -DGTEST_INDIVIDUAL=OFF -DSECURITY=OFF -DPERFORMANCE_TESTS=ON -DNO_TLS=ON -DCMAKE_BUILD_TYPE=Release
4.3 Test execution
Cyclone DDS
The execution of the Throughput tests has been done using guidelines provided by Cyclone DDS in its GitHub repository. The results are obtained from the Subscriber's output.
- Terminal 1. Run the publisher with the following settings:
- payloadSize: 11000000 Bytes.
- burstInterval: 0 ms.
- burstSize: 10 samples.
- timeOut: 0 seconds (infinite).
- partitionName: Throughput test.
./ThroughputPublisher 11000000 0 10 0 "Throughput test"
- Terminal 2. Run the subscriber with the following settings:
- maxCycles: 50 cycles.
- pollingDelay: 0 seconds.
- partitionName: Throughput test.
./ThroughputSubscriber 50 0 "Throughput test"
Fast DDS
The following parameters have been used to run Fast DDS Throughput tests.
- Payload size: 11000000 Bytes.
- Recovery time: 0 ms.
- Burst size (demand): 10 samples.
To set the above test parameters:
- Replace the content of the
<fastdds_workspace>/src/fastrtps/test/performance/throughput/recoveries.csv file with a 0. - Replace the content of the
<fastdds_workspace>/src/fastrtps/test/performance/throughput/payloads_demands.csv file with the following new content: 11000000;10
Finally, the following command has been executed 50 times to obtain the results of the reception Throughput.
colcon test \
--event-handlers console_direct+ \
--packages-select fastrtps \
--ctest-args -R performance.throughput.interprocess_reliable_shm \
--timeout 3600
5. DDS application configuration
5.1 DataWriter/DataReader configuration
Cyclone DDS & Fast DDS
Both the Cyclone DDS and Fast DDS applications ensure reliable communication at the DDS application level, i.e. the reliability of the DataWriters and DataReaders is configured as RELIABLE regardless of the transport protocol used.
In addition, the DataWriters and DataReaders History of both applications is configured as KEEP_ALL.
Fast DDS
Fast DDS tests have been performed for both publication modes:
- SYNCHRONOUS_PUBLISH_MODE: The data is sent in the context of the user thread that calls the write operation.
- ASYNCHRONOUS_PUBLISH_MODE: An internal thread takes the responsibility of sending the data asynchronously. The write operation returns before the data is actually sent.
5.2 Transport
Cyclone DDS
It uses the UDP transport protocol. Intraprocess communication does not apply in this case since the Publisher and Subscriber are running on different processes of the same system.
Fast DDS
It uses the Shared Memory transport (SHM) protocol. The SHM transport parameters that are configured are as listed below.
- Segment size (segment_size()): The size of the shared memory segment (in octets).
It is set to 2GB in order to have enough memory to perform the test.
segment_size() = 2147000000
Max message size (maxMessageSize): Maximum size of a single message in the transport. It is set to 12MB to avoid data fragmentation.
maxMessageSize = 12000000
6. Test results
The following table shows the processed results of 50 throughput test runs for each of the configurations and DDS implementations.
Fast DDS (SHM - Sync.) | Fast DDS (SHM - Async.) | Cyclone DDS (UDP) | |
Mean | 12892,34 | 12868,48 | 8938,42 |
Standard deviation | 99,49 | 92,48 | 178,35 |
Median | 12882,35 | 12878,37 | 8976,00 |
The following two figures show the resultant throughput distribution of 50 iterations of the throughput tests. The graph shows how the results of the Fast DDS application are more uniform, i.e. the throughput results are less spread out between the different test runs. This can also be seen in the standard deviation value, which is lower for Fast DDS than for Cyclone DDS.
The following bar chart shows a comparison of the average throughput of all test runs for each of the DDS vendors.
Further information
For any questions about the methodology, the specifics or how to replicate these results, or any other question you may have, please do not hesitate in contacting