Optimizing Real-Time Data Delivery with UDP Offload Engines in LambdaGrids

Slide Note
Embed
Share

Real-time interactive scientific visualization and high-definition video conferencing in large-scale visualization environments like LambdaGrids require high-throughput, low latency, and low jitter data delivery. This article discusses the motivation behind using UDP-based transport protocols and the challenges of CPU-intensive protocols in such environments. It explores the concept of converting TCP Offload Engines to UDP Offload Engines and shares insights from experiments conducted on a test bed network.


Uploaded on Sep 25, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. A Case for UDP Offload Engines in LambdaGrids Venkatram Vishwanath, Jason Leigh Electronic Visualization Laboratory, University of Illinois at Chicago Pavan Balaji, Dhabaleshwar Panda Ohio State University Wu-chun Feng Virginia Tech electronic visualization laboratory, university of illinois at chicago

  2. Motivation Real-time interactive scientific visualization and high-definition video conferencing require high- throughput, low latency and low jitter data delivery. UDP is commonly used for transporting real-time streaming media. Trend in large-scale viz is to give users thin low-maintenance clients and have the high resolution visualizations streamed to them from remote supercomputers. E.g. TeraGrid, OptIPuter, IBM Deep Computing Visualization, Sun Ray. Scripps Institution of Oceanography electronic visualization laboratory, university of illinois at chicago

  3. Motivation Reliable UDP-based transport protocols (LambdaStream, RBUDP, UDT, Tsunami) have found a home on the LambdaGrid where network paths can be provisioned by applications for sole use. But these protocols are CPU intensive. While TCP offload is available commercially, equivalent for UDP is not. electronic visualization laboratory, university of illinois at chicago

  4. High Performance Sockets and Protocol Offload Engines Sockets Application TCP/IP Sockets Interface Offload the processing of Protocols from CPU to the Network Adaptors High Performance Sockets Sockets Similar to a GPU (NPU) TCP/IP Currently only TCP Off- load Engines (TOE) are available. Device Driver Offloaded Transport Protocol Network Adapter electronic visualization laboratory, university of illinois at chicago

  5. Converting a TCP Off-Load Engine (TOE) to a UDP Off-Load Engine (UOE) Disable Congestion Control. Change the driver to allow for out of order packets- whereas default TOE will compensate for it. Seamlessly hijack an application s UDP calls to use this alternative : Implement a Connection Management Layer to turn a connectionless protocol (such as UDP) into a connection-oriented protocol (ie TOE). Implement our own Data Management Layer to deal with out of order packets. Eliminate TCP s need for Acknowledgments. Not able to completely disable this on Chelsio T110. Temporary workaround by setting very large window sizes. electronic visualization laboratory, university of illinois at chicago

  6. Experiment Test Bed Local Area Network Dual Opterons 2.4 Ghz 1MB L2 Cache. 4GB 200Mhz DDR SDRAM. Vanilla 2.6.6 SMP kernel. Chelsio T110 Chelsio Driver version 2.1.1 electronic visualization laboratory, university of illinois at chicago

  7. Throughput - UOE vs Traditional Host-based UDP Throughput vs Message Size 8000 7000 UOE UDP 6000 Throughput (Mbits/sec) 5000 4000 3000 2000 1000 0 12 16 32 64 128 256 512 1k 2k 4k 8k 16k 32k 64K Message Size (bytes) Initial Iperf results: 7.4 Gbps Maximum throughput. 35% improvement over Host based UDP. electronic visualization laboratory, university of illinois at chicago

  8. CPU Utilization - UOE vs Host-based UDP Throughput - CPU Utilization 8000 0.5 UOE (CPU) UDP (CPU) UOE (normalized CPU) UOE UDP 0.45 7000 0.4 6000 Throughput (Mbits/sec) 0.35 CPU Utilization 5000 0.3 0.25 4000 0.2 3000 0.15 2000 0.1 1000 0.05 0 0 12 16 32 64 128 256 512 1k 2k 4k 8k 16k 32k 64K Message Size (bytes) Up to 50% improvement in CPU utilization. Important because in real applications, CPU has other work to do NOT just move the data. E.g. Decode or decrypt the images / data. electronic visualization laboratory, university of illinois at chicago

  9. Latency Latency 40 35 30 Latency (usec) 25 20 15 10 UOE UDP 5 0 1 2 4 8 16 32 64 128 256 512 1024 Message Size (Bytes) 17% improvement in Latency. electronic visualization laboratory, university of illinois at chicago

  10. Conclusion There is a case for them for applications. There are real benefits in terms of performance by doing it. Especially useful for streaming visualization / high definition video. Message from Venkat to Michael Chen :^) I want: T210s - replacement for T110s. It is the lower power version and much smaller. When will PCI-Express NICs be available? Netereon is already available. Would like to correspond more deeply with Chelsio engineers to resolve ACK issues. electronic visualization laboratory, university of illinois at chicago

  11. Future Work Compare with Partial Offload and quantify / identify what is the most useful part of a stack to offload. E.g. offload checksum of packet rather than the entire UDP stack. Conduct MAN and WAN Area trials. Compare with other Partial offload NICs such as Neterion. Implementation & Comparison on Myrinet 10G. Apply UOE implementation to a currently existing UDP-based transport protocol like LambdaStream, RBUDP, UDT, Tsunami, etc.. electronic visualization laboratory, university of illinois at chicago

  12. Thank You Chelsio Engineers National Science Foundation: CNS-0224306 Research Resources: Matching Visualization & Intelligent Data Mining to Experimental Networks CNS-0420477 LambdaVision (Major Research Instrumentation) OCI-0229642 StarLight: Strategic Technologies for Internet Discovery and Development (STI) OCI-0441094 TransLight/StarLight OCI-0225642 The OptIPuter National Institutes of Health, the State of Illinois, the Office of Naval Research NTT Optical Network Systems Laboratory in Japan electronic visualization laboratory, university of illinois at chicago

Related