Prototyping for Forward CSTS Performance Analysis
Evaluation of the CSTS WG prototyping for enhancing forward frame service performance based on NASA reports, proposing frame blocking for increased throughput, and conducting measurements using specific set-up procedures. The objective is to optimize the CSTS FW Forward Specification for improved data processing.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
CSTS WG Prototyping for Forward CSTS Performance Boulder November 2011 Martin Karch 1
Prototyping for Fwd CSTS Performance Structure of the Presentation Background and Objective Measurement Set-Up Results Summary 2
Background and Objective Reports from NASA: Prototyping experimental Synchronous Forward Frame Service (CLTU Orange Book) CLTU Service approach limits throughput seriously Target rate (25 mbps) could only be reached if - 7 frames of 220 bytes length - 1 data parameter of one single CLTU No radiation reports for individual CLTU (only the complete one) No investigation yet available Why can throughput not be reached when frames are transferred in a single CLTU? What is the cost of acknowledging every frame? 3
Background and Objective Based on the Reports: Suggest blocking of several frames into one data parameter for Potential future Forward Frame CSTS Process Data Operation/Data Processing Procedure (FW) Objective of Prototyping Verify the blocking of data items significantly increases the throughput Investigate if bottleneck is in service provisioning (actual protocol between user and provider) Results shall support selection of most appropriate approach for the CSTS FW Forward Specification Measurements are made for Protocol Performance 4
Measurement Set-Up 2 machines equipped with Xeon 4C X3460 2.8GHz/1333MHz/8MB Provider User 4GB memory Adapted SGM Adapted NIS Model Linux SLES 11 64 bit Simulated Radiation Process Isolated LAN CLTUs SLE Provider SLE User 1 Gbit cable connection (no switch) SLE Operations SLE Operations TCP/IP SLE API SLE API SGM (SLE Ground Models) NIS (Network Interface System) 5
Measurement Set-Up Provider SGM SLE Ground Models Simulation Environment User NIS Network Interface System Simulation Environment SGM changed such that Receiving Thread puts CLTUs on a Queue for Radiation A Radiation Thread removes CLTUs and discards them No further simulation of Radiation process (radiation duration) NIS is modified to Create (as fast as possible) CLTU operation objects Immediately passes them to SLE API for transmission No interface to a Mission Control System (MCS) 6
Measurement Set-Up Basis for all Steps: SGM based provider NIS based user Step 1 Measurements: Variation of CLTU length - Simulates sending many small CLTUs - In one TRANSFER DATA Invocation (1st approximation) Step 2 Measurements: SLE API modified - Aggregate configurable number of CLTU (SEQUENCE OF Cltu) - With minimum annotation (CLTU Id, sequence count) - Send return when last data unit is acknowledged 7
Step1 / Measurement 1 Linear curve Proportional to CLTU size Constant Processing Time Independent of CLTU size SGM + NIS model optimised SLE API optimised Nagle + delayed ack RTT yes no on 0.1 ms 8
Step1 / Measurement 2 SGM + NIS model optimised SLE API optimised Nagle + delayed ack RTT yes yes on 0.1 ms 9
Step1 / Measurement 3 SGM + NIS model optimised SLE API optimised Nagle + delayed ack RTT yes yes off 0.1 ms 10
Step1 / Measurement 4 Processing Time still constant Transfer-time increased SGM + NIS model optimised SLE API optimised Nagle + delayed ack RTT yes yes off 400 ms 11
Step1 / Measurement 5.1 Msmnt 5.1: Reference Measurement for Measurements with variations of RTT using IPerf Msmnt 5.2: Measurements using SGM + NIS 12
Step1 / Measurement 5.2 Shows influence of transmission time only Delay is dominating factor As expected (1/RTT) Ratio Msmnt/Iperf = 0.165 (1544) Ratio Msmnt/Iperf = 0.153 (1000) SGM + NIS model optimised SLE API optimised Nagle + delayed ack RTT yes yes off variable 13
Step1 / Measurement 5 (2) Operates with Maximum Send and Receive Buffer Question: How big must the window size be to achieve similar throughput values like above ( for the example of 40 Mbit/sec) Maximum Data Rate = Buffer size/RTT Window size 64 KB 13MB 25 MB 25 MB CLTU CLTU Count 5000 5000 5000 5000 5000 5000 5000 5000 50000 50000 50000 Data Rate [Megabit/s] 1.713 0.861 0.431 0.216 1.595 0.799 0.400 0.200 11.812 11.666 40.221 Send Send Time per CLTU [ms] RTT [ms] Size [byte] Data [byte] 7,720,000 7,720,000 7,720,000 7,720,000 5,000,000 5,000,000 5,000,000 5,000,000 77,200,000 77,200,000 77,200,000 Duration [s] 36.056 71.761 143.351 286.005 25.077 50.080 100.079 200.087 52.287 52.939 15.355 CLTU Rate [#/s] 138.673 50 100 200 400 50 100 200 400 100 100 0.1 1544 1544 1544 1544 1000 1000 1000 1000 1544 1544 1544 7.211 14.352 28.670 57.201 5.015 10.016 20.016 40.017 1.046 1.059 0.307 69.676 34.879 17.482 199.386 99.840 49.961 24.989 956.261 944.483 3256.268 14
Step 1 Measurements Summary Linear increase of data rate with CLTU length sending as fast as possible no network delay Constant Processing Time Best results with Optimised Code - 5 to 10 % performance increase (optimised SLE API only) Nagle and Delayed Ack. switched off - (factor 2.5 lower when Nagle Alg. and Delayed Ack. are both on) No network delay Network delay 200 ms (400 RTT) Performance decrease of a factor of 400 compared to Measurement 2 (the best one) Maximum Data Rate = Buffer size/RTT We have to take care on the size of the CLTU 15
What is the Cost of Confirmed Operations Data unit size = 8000 byte CLTU: 207.57 Mbps RAF: 318.32 Mbps ( Frame size 8000 byte, 1 frame/buffer) Increase by 53% Data unit size = 2000 byte CLTU: 53,36 Mbps RAF: 85.64 Mbps ( Frame size 2000 byte, 1 frame/buffer) Increase by 60% 16
Effects of Buffering (RAF) Frame Size Frames/Buffer Mbps Frame/sec msec/frame 8000 4000 2000 1000 800 400 200 100 1 2 4 8 322.79 244.41 167.02 108.18 88.67 46.25 27.05 13.78 5,120 7,903 10,845 13,310 15,102 17,128 18,969 19,122 0.195 0.126 0.092 0.075 0.066 0.058 0.052 0.052 10 20 40 80 Frame size = 2000 byte, 1 Frame/buffer: 85.64 Mbps Concatenation of 80 frames of 100 byte into a buffer back-to- back and then passed to the API as one frame: 322.43 Mbps 17
Same in Graphical Presentation Effect of Buffering 350000 300000 250000 200000 150000 100000 50000 0 8000 - 1 4000 - 2 2000 - 4 1000 - 8 800 - 10 400 - 20 200 - 40 100 - 80 18
RAF Measurment Configuration Frame Generator frame SLE Service Provider Application SLE Service User Application frame frame SLE API SLE API transfer buffer TCP (local) transfer buffer Communication Server TCP 19
Cost of ASN.1 Encoding (RAF) Result of profiling for RAF, frame size 100 byte, 80 frames per buffer: Encoding of Transfer Buffer including all contained frames: 6.42% Encoding of Transfer Buffer Invocation alone: 2.31% Effects might be caused by increased interactions / interrupts, etc. 20
Summary of Observations Size of the data unit transferred has a significant impact Almost constant end to end processing time independent of buffer size Liner increase of net bitrate with data unit size Large impact on network delay due to TCP (expected) Significant additional cost of using confirmed operations Buffering of frames vs, transfer in individual frames 4 frames of 2K per buffer vs single 2K frames: factor 1.9 BUT: throughput for a single large data unit is much larger than buffer of same size containing multiple small units ASN.1 encoding for worst case test accounts for 6.4% of overall local processing time 21