Performance Characterization of a 10-Gigabit Ethernet TOE

undefined
 
P
e
r
f
o
r
m
a
n
c
e
 
C
h
a
r
a
c
t
e
r
i
z
a
t
i
o
n
 
o
f
 
a
1
0
-
G
i
g
a
b
i
t
 
E
t
h
e
r
n
e
t
 
T
O
E
 
W
.
 
F
e
n
g
¥
P
.
 
B
a
l
a
j
i
α
C
.
 
B
a
r
o
n
£
L. N. Bhuyan
£
 
D. K. Panda
α
 
¥
Advanced Computing Lab,
Los Alamos National Lab
 
α
Network Based Computing Lab,
Ohio State University
 
£
CARES Group,
U. C. Riverside
 
Ethernet Overview
 
Ethernet is the most widely used network infrastructure today
Traditionally Ethernet has been notorious for performance issues
Near an order-of-magnitude performance gap compared to IBA, Myrinet, etc.
Cost conscious architecture
Most Ethernet adapters were 
regular (layer 2)
 adapters
Relied on host-based TCP/IP for network and transport layer support
Compatibility with existing infrastructure (switch buffering, MTU)
Used by 42.4% of the Top500 supercomputers
Key: Reasonable performance at low cost
TCP/IP over Gigabit Ethernet (GigE) can nearly saturate the link for current systems
Several local stores give out GigE cards free of cost ! 
10-Gigabit Ethernet (10GigE) recently introduced
10-fold (theoretical) increase in performance while retaining existing features
10GigE: Technology Trends
Broken into three levels of technologies
Regular 10GigE adapters
Layer-2 adapters
Rely on host-based TCP/IP to provide network/transport functionality
Could achieve a high performance with optimizations
TCP Offload Engines (TOEs)
Layer-4 adapters
Have the entire TCP/IP stack offloaded on to hardware
Sockets layer retained in the host space
RDDP-aware adapters
Layer-4 adapters
Entire TCP/IP stack offloaded on to hardware
Support more features than TCP Offload Engines
No sockets ! Richer RDDP interface !
E.g., Out-of-order placement of data, RDMA semantics
 
[
f
e
n
g
0
3
:
h
o
t
i
,
 
f
e
n
g
0
3
:
s
c
]
 
[
E
v
a
l
u
a
t
i
o
n
 
b
a
s
e
d
 
o
n
 
t
h
e
 
C
h
e
l
s
i
o
 
T
1
1
0
 
T
O
E
 
a
d
a
p
t
e
r
s
]
 
Presentation Overview
 
Introduction and Motivation
TCP Offload Engines Overview
Experimental Evaluation
Conclusions and Future Work
Sockets Interface
Application or Library
What is a TCP Offload Engine (TOE)?
Hardware
User
Kernel
TCP
IP
Device Driver
Network Adapter
(e.g., 10GigE)
Sockets Interface
Application or Library
Hardware
User
Kernel
TCP
IP
Device Driver
Network Adapter (e.g., 10GigE)
Offloaded TCP
Offloaded IP
T
r
a
d
i
t
i
o
n
a
l
 
T
C
P
/
I
P
 
s
t
a
c
k
T
O
E
 
s
t
a
c
k
Sockets Layer
 
Interfacing with the TOE
Application or Library
Traditional
Sockets Interface
High Performance Sockets
User-level Protocol
TCP/IP
Device Driver
 
 
 
High Performance Network Adapter
N
e
t
w
o
r
k
 
F
e
a
t
u
r
e
s
(
e
.
g
.
,
 
O
f
f
l
o
a
d
e
d
 
P
r
o
t
o
c
o
l
)
TOM
Application or Library
toedev
TCP/IP
Device Driver
 
 
 
High Performance Network Adapter
N
e
t
w
o
r
k
 
F
e
a
t
u
r
e
s
(
e
.
g
.
,
 
O
f
f
l
o
a
d
e
d
 
P
r
o
t
o
c
o
l
)
 
H
i
g
h
 
P
e
r
f
o
r
m
a
n
c
e
 
S
o
c
k
e
t
s
 
T
C
P
 
S
t
a
c
k
 
O
v
e
r
r
i
d
e
 
 No changes required to the core kernel
 Some of the sockets functionality duplicated
 
 Kernel needs to be patched
 Some of the TCP functionality duplicated
 No duplication in the sockets functionality
 
Control
Path
 
Data
Path
1.
C
o
m
p
a
t
i
b
i
l
i
t
y
:
 
N
e
t
w
o
r
k
-
l
e
v
e
l
c
o
m
p
a
t
i
b
i
l
i
t
y
 
w
i
t
h
 
e
x
i
s
t
i
n
g
T
C
P
/
I
P
/
E
t
h
e
r
n
e
t
;
 
A
p
p
l
i
c
a
t
i
o
n
-
l
e
v
e
l
c
o
m
p
a
t
i
b
i
l
i
t
y
 
w
i
t
h
 
t
h
e
 
s
o
c
k
e
t
s
 
i
n
t
e
r
f
a
c
e
2.
P
e
r
f
o
r
m
a
n
c
e
:
 
A
p
p
l
i
c
a
t
i
o
n
 
p
e
r
f
o
r
m
a
n
c
e
n
o
 
l
o
n
g
e
r
 
r
e
s
t
r
i
c
t
e
d
 
b
y
 
t
h
e
 
p
e
r
f
o
r
m
a
n
c
e
o
f
 
t
r
a
d
i
t
i
o
n
a
l
 
h
o
s
t
-
b
a
s
e
d
 
T
C
P
/
I
P
 
s
t
a
c
k
3.
F
e
a
t
u
r
e
-
r
i
c
h
 
i
n
t
e
r
f
a
c
e
:
 
A
p
p
l
i
c
a
t
i
o
n
i
n
t
e
r
f
a
c
e
 
r
e
s
t
r
i
c
t
e
d
 
t
o
 
t
h
e
 
s
o
c
k
e
t
s
i
n
t
e
r
f
a
c
e
 
!
What does the TOE (NOT) provide?
Hardware
Kernel  or
Hardware
User
Application or Library
Traditional
Sockets Interface
Transport Layer (TCP)
Network Layer (IP)
Device Driver
Network Adapter (e.g., 10GigE)
Kernel
 
[
r
a
i
t
0
5
]
:
 
S
u
p
p
o
r
t
 
i
W
A
R
P
 
c
o
m
p
a
t
i
b
i
l
i
t
y
 
a
n
d
 
f
e
a
t
u
r
e
s
 
f
o
r
 
r
e
g
u
l
a
r
 
n
e
t
w
o
r
k
 
a
d
a
p
t
e
r
s
.
 
P
.
 
B
a
l
a
j
i
,
 
H
.
 
W
.
 
J
i
n
,
 
K
.
V
a
i
d
y
a
n
a
t
h
a
n
 
a
n
d
 
D
.
 
K
.
 
P
a
n
d
a
.
 
I
n
 
t
h
e
 
R
A
I
T
 
w
o
r
k
s
h
o
p
;
 
h
e
l
d
 
i
n
 
c
o
n
j
u
n
c
t
i
o
n
 
w
i
t
h
 
C
l
u
s
t
e
r
 
C
o
m
p
u
t
i
n
g
,
 
A
u
g
 
2
6
t
h
,
 
2
0
0
5
.
 
[
r
a
i
t
0
5
]
 
Presentation Overview
 
Introduction and Motivation
TCP Offload Engines Overview
Experimental Evaluation
Conclusions and Future Work
 
Experimental Test-bed and the Experiments
 
Two test-beds used for the evaluation
Two 2.2GHz Opteron machines with 1GB of 400MHz DDR SDRAM
Nodes connected back-to-back
Four 2.0GHz quad-Opteron machines with 4GB of 333MHz DDR SDRAM
Nodes connected with a Fujitsu XG1200 switch (450ns flow-through latency)
Evaluations in three categories
Sockets-level evaluation
Single-connection Micro-benchmarks
Multi-connection Micro-benchmarks
MPI-level Micro-benchmark evaluation
Application-level evaluation with the Apache Web-server
 
Latency and Bandwidth Evaluation (MTU 9000)
 
 TOE achieves a latency of about 8.6us and a bandwidth of 7.6Gbps at the sockets layer
 Host-based TCP/IP achieves a latency of about 10.5us (25% higher) and a bandwidth of 7.2Gbps (5% lower)
 For Jumbo frames, host-based TCP/IP performs quite close to the TOE
9000)
 
Latency and Bandwidth Evaluation (MTU 1500)
 
 No difference in latency for either stack
 The bandwidth of host-based TCP/IP drops to 4.9Gbps (more interrupts; higher overhead)
 For standard sized frames, TOE significantly outperforms host-based TCP/IP (segmentation offload is the key)
 
Multi-Stream Bandwidth
 
The throughput of the TOE stays between 7.2 and 7.6Gbps
 
Hot Spot Latency Test (1 byte)
 
Connection scalability tested up to 12 connections; TOE achieves similar or better
scalability as the host-based TCP/IP stack
 
Fan-in and Fan-out Throughput Tests
 
Fan-in and Fan-out tests show similar scalability
 
MPI-level Comparison
 
MPI latency and bandwidth show similar trends as socket-level latency and bandwidth
Application-level Evaluation: Apache Web-Server
Apache Web-server
Web Client
Web Client
Web Client
We perform two kinds of evaluations with the Apache web-server:
1.
Single file traces
All clients always request the same file of a given size
Not diluted by other system and workload parameters
2.
Zipf-based traces
The probability of requesting the I
th
 most popular document is inversely proportional to I
α
α
 is constant for a given trace; it represents the temporal locality of a trace
A high 
α
 value represents a high percent of requests for small files
 
Apache Web-server Evaluation
 
Presentation Overview
 
Introduction and Motivation
TCP Offload Engines Overview
Experimental Evaluation
Conclusions and Future Work
 
Conclusions
 
For a wide-spread acceptance of 10-GigE in clusters
Compatibility
Performance
Feature-rich interface
Network as well as Application-level compatibility is available
On-the-wire protocol is still TCP/IP/Ethernet
Application interface is still the sockets interface
Performance Capabilities
Significant performance improvements compared to the host-stack
Close to 65% improvement in bandwidth for standard sized (1500byte) frames
Feature-rich interface: Not quite there yet !
Extended Sockets Interface
iWARP offload
 
Continuing and Future Work
 
Comparing 10GigE TOEs to other interconnects
Sockets Interface [cluster05]
MPI Interface
File and I/O sub-systems
Extending the sockets interface to support iWARP capabilities
[rait05]
Extending the TOE stack to allow protocol offload for UDP sockets
 
Web Pointers
 
http://public.lanl.gov/radiant
http://nowlab.cse.ohio-state.edu
 
feng@lanl.gov
balaji@cse.ohio-state.edu
 
NOWLAB
Slide Note
Embed
Share

The Network Based Computing Laboratory at Ohio State University and Los Alamos National Lab conducted a study on the performance of a 10-Gigabit Ethernet TCP Offload Engine (TOE). The research highlights the advancements and technology trends in Ethernet infrastructure, focusing on achieving higher performance levels while keeping costs low. The study emphasizes the role of TCP Offload Engines in enhancing network and transport functionality, with comparisons between different types of adapters. The presentation covers an overview of TOEs, experimental evaluations, and future considerations in this field.


Uploaded on Sep 16, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. NETWORK BASED COMPUTING LABORATORY Performance Characterization of a 10-Gigabit Ethernet TOE W. Feng C. Baron P. Balaji L. N. Bhuyan D. K. Panda Advanced Computing Lab, Network Based Computing Lab, CARES Group, Los Alamos National Lab Ohio State University U. C. Riverside

  2. NETWORK BASED COMPUTING LABORATORY Ethernet Overview Ethernet is the most widely used network infrastructure today Traditionally Ethernet has been notorious for performance issues Near an order-of-magnitude performance gap compared to IBA, Myrinet, etc. Cost conscious architecture Most Ethernet adapters were regular (layer 2) adapters Relied on host-based TCP/IP for network and transport layer support Compatibility with existing infrastructure (switch buffering, MTU) Used by 42.4% of the Top500 supercomputers Key: Reasonable performance at low cost TCP/IP over Gigabit Ethernet (GigE) can nearly saturate the link for current systems Several local stores give out GigE cards free of cost ! 10-Gigabit Ethernet (10GigE) recently introduced 10-fold (theoretical) increase in performance while retaining existing features

  3. NETWORK BASED COMPUTING LABORATORY 10GigE: Technology Trends Broken into three levels of technologies Regular 10GigE adapters Layer-2 adapters Rely on host-based TCP/IP to provide network/transport functionality Could achieve a high performance with optimizations [feng03:hoti, feng03:sc] TCP Offload Engines (TOEs) [Evaluation based on the Chelsio T110 TOE adapters] Layer-4 adapters Have the entire TCP/IP stack offloaded on to hardware Sockets layer retained in the host space RDDP-aware adapters Layer-4 adapters Entire TCP/IP stack offloaded on to hardware Support more features than TCP Offload Engines No sockets ! Richer RDDP interface ! E.g., Out-of-order placement of data, RDMA semantics

  4. NETWORK BASED COMPUTING LABORATORY Presentation Overview Introduction and Motivation TCP Offload Engines Overview Experimental Evaluation Conclusions and Future Work

  5. NETWORK BASED COMPUTING LABORATORY What is a TCP Offload Engine (TOE)? TOE stack Traditional TCP/IP stack Application or Library User Application or Library User Sockets Interface Sockets Interface TCP TCP IP Kernel IP Device Driver Kernel Device Driver Offloaded TCP Offloaded IP Network Adapter (e.g., 10GigE) Hardware Hardware Network Adapter (e.g., 10GigE)

  6. NETWORK BASED COMPUTING LABORATORY Interfacing with the TOE High Performance Sockets TCP Stack Override Application or Library Application or Library High Performance Sockets Sockets Layer User-level Protocol toedev Traditional Sockets Interface Control Path Data Path TCP/IP TOM TCP/IP Device Driver Device Driver Network Features (e.g., Offloaded Protocol) Network Features (e.g., Offloaded Protocol) High Performance Network Adapter High Performance Network Adapter Kernel needs to be patched No changes required to the core kernel Some of the TCP functionality duplicated Some of the sockets functionality duplicated No duplication in the sockets functionality

  7. NETWORK BASED COMPUTING LABORATORY What does the TOE (NOT) provide? Compatibility: Network-level 1. User Application or Library compatibility with existing TCP/IP/Ethernet; Application-level Traditional Sockets Interface Kernel compatibility with the sockets interface Performance: Application performance 2. Transport Layer (TCP) Kernel or Hardware no longer restricted by the performance Network Layer (IP) of traditional host-based TCP/IP stack Device Driver Feature-rich interface: Application 3. interface restricted to the sockets Hardware Network Adapter (e.g., 10GigE) [rait05] interface ! [rait05]: Support iWARP compatibility and features for regular network adapters. P. Balaji, H. W. Jin, K. Vaidyanathan and D. K. Panda. In the RAIT workshop; held in conjunction with Cluster Computing, Aug 26th, 2005.

  8. NETWORK BASED COMPUTING LABORATORY Presentation Overview Introduction and Motivation TCP Offload Engines Overview Experimental Evaluation Conclusions and Future Work

  9. NETWORK BASED COMPUTING LABORATORY Experimental Test-bed and the Experiments Two test-beds used for the evaluation Two 2.2GHz Opteron machines with 1GB of 400MHz DDR SDRAM Nodes connected back-to-back Four 2.0GHz quad-Opteron machines with 4GB of 333MHz DDR SDRAM Nodes connected with a Fujitsu XG1200 switch (450ns flow-through latency) Evaluations in three categories Sockets-level evaluation Single-connection Micro-benchmarks Multi-connection Micro-benchmarks MPI-level Micro-benchmark evaluation Application-level evaluation with the Apache Web-server

  10. NETWORK BASED COMPUTING LABORATORY Latency and Bandwidth Evaluation (MTU 9000) Ping-pong Latency (MTU 1500) 9000) Uni-directional Bandwidth (MTU 9000) 18 8000 Non-TOE TOE 16 7000 14 Non-TOE TOE 6000 Bandwidth (Mbps) 12 Latency (us) 5000 10 4000 8 3000 6 2000 4 1000 2 0 0 1 8 4K 64 512 256K 32K 1 4 1K 16 64 256 Message Size (bytes) Message Size (bytes) TOE achieves a latency of about 8.6us and a bandwidth of 7.6Gbps at the sockets layer Host-based TCP/IP achieves a latency of about 10.5us (25% higher) and a bandwidth of 7.2Gbps (5% lower) For Jumbo frames, host-based TCP/IP performs quite close to the TOE

  11. NETWORK BASED COMPUTING LABORATORY Latency and Bandwidth Evaluation (MTU 1500) Ping-pong Latency (MTU 1500) Uni-directional Bandwidth (MTU 1500) 18 8000 Non-TOE TOE 16 7000 14 Non-TOE TOE 6000 Bandwidth (Mbps) 12 Latency (us) 5000 10 4000 8 3000 6 2000 4 1000 2 0 0 1 8 64 512 4K 32K 256K 1 4 16 64 256 1K Message Size (bytes) Message Size (bytes) No difference in latency for either stack The bandwidth of host-based TCP/IP drops to 4.9Gbps (more interrupts; higher overhead) For standard sized frames, TOE significantly outperforms host-based TCP/IP (segmentation offload is the key)

  12. NETWORK BASED COMPUTING LABORATORY Multi-Stream Bandwidth Multi-Stream Bandwidth 8000 7000 Aggregate Bandwidth (Mbps) 6000 5000 4000 3000 Non-TOE TOE 2000 1000 0 1 2 3 4 5 6 7 8 9 10 11 12 Number of Streams The throughput of the TOE stays between 7.2 and 7.6Gbps

  13. NETWORK BASED COMPUTING LABORATORY Hot Spot Latency Test (1 byte) Hot-Spot Latency 60 50 Hot-Spot Latency (us) Non-TOE TOE 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 Number of Client Processes Connection scalability tested up to 12 connections; TOE achieves similar or better scalability as the host-based TCP/IP stack

  14. NETWORK BASED COMPUTING LABORATORY Fan-in and Fan-out Throughput Tests Fan-in Throughput Test Fan-out Throughput Test 8000 8000 Aggregate Throughput (Mbps) Aggregate Throughput (Mbps) 7000 7000 6000 6000 5000 5000 4000 4000 3000 3000 2000 2000 1000 1000 0 0 1 3 5 7 9 11 1 3 5 7 9 11 Number of Client Processes Number of Client Processes Non-TOE TOE Non-TOE TOE Fan-in and Fan-out tests show similar scalability

  15. NETWORK BASED COMPUTING LABORATORY MPI-level Comparison MPI Latency (MTU 1500) MPI Bandwidth (MTU 1500) 8000 20 Non-TOE TOE 18 7000 Non-TOE TOE 16 6000 14 Bandwidth (Mbps) Latency (us) 5000 12 4000 10 8 3000 6 2000 4 1000 2 0 0 1 12288 49155 29 128 515 3069 1 3E+05 1E+06 4E+06 8 21 35 64 125 195 384 765 Message Size (bytes) Message Size (bytes) MPI latency and bandwidth show similar trends as socket-level latency and bandwidth

  16. NETWORK BASED COMPUTING LABORATORY Application-level Evaluation: Apache Web-Server Apache Web-server Web Client Web Client Web Client We perform two kinds of evaluations with the Apache web-server: 1. Single file traces All clients always request the same file of a given size Not diluted by other system and workload parameters 2. Zipf-based traces The probability of requesting the Ith most popular document is inversely proportional to I is constant for a given trace; it represents the temporal locality of a trace A high value represents a high percent of requests for small files

  17. NETWORK BASED COMPUTING LABORATORY Apache Web-server Evaluation Single File Trace Performance ZipF Trace Performance 7000 35000 Non-TOE TOE Transactions per Second (TPS) 30000 6000 Non-TOE TOE Transactions per Second 25000 5000 20000 4000 15000 3000 10000 2000 5000 1000 0 1K 0 1M 4K 16K 64K 256K 0.9 0.75 0.5 Alpha 0.25 0.1 File Size

  18. NETWORK BASED COMPUTING LABORATORY Presentation Overview Introduction and Motivation TCP Offload Engines Overview Experimental Evaluation Conclusions and Future Work

  19. NETWORK BASED COMPUTING LABORATORY Conclusions For a wide-spread acceptance of 10-GigE in clusters Compatibility Performance Feature-rich interface Network as well as Application-level compatibility is available On-the-wire protocol is still TCP/IP/Ethernet Application interface is still the sockets interface Performance Capabilities Significant performance improvements compared to the host-stack Close to 65% improvement in bandwidth for standard sized (1500byte) frames Feature-rich interface: Not quite there yet ! Extended Sockets Interface iWARP offload

  20. NETWORK BASED COMPUTING LABORATORY Continuing and Future Work Comparing 10GigE TOEs to other interconnects Sockets Interface [cluster05] MPI Interface File and I/O sub-systems Extending the sockets interface to support iWARP capabilities [rait05] Extending the TOE stack to allow protocol offload for UDP sockets

  21. NETWORK BASED COMPUTING LABORATORY Web Pointers Network Based Computing Laboratory NOWLAB http://public.lanl.gov/radiant http://nowlab.cse.ohio-state.edu feng@lanl.gov balaji@cse.ohio-state.edu

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#