Evolution of OpenFabrics Interfaces Architecture

 
Open Fabrics Interfaces
Architecture Introduction
 
Sean Hefty
Intel Corporation
 
Current State of Affairs
 
OFED SW was not designed around HPC
Hardware and fabric features are changing
Divergence is driving competing APIs
Interfaces are being extended, and new APIs introduced
Long delay in adoption
Size of clusters and single node core counts greatly
increasing
More applications are wanting to take advantage of high-
performance fabrics
 
2
 
Solution
 
3
Target needs of HPC
Support multiple interface semantics
Fabric and vendor agnostic
Supportable in upstream Linux
 
Enabling through OpenFabrics
 
Leveraging existing open source community
Broad ecosystem
Application developers and vendors
Active community engagement
Drive high-performance software APIs
Take advantage of available hardware features
Support multiple product generations
 
4
 
OFIWG Charter
 
D
e
v
e
l
o
p
 
a
n
 
e
x
t
e
n
s
i
b
l
e
,
 
o
p
e
n
 
s
o
u
r
c
e
 
f
r
a
m
e
w
o
r
k
 
a
n
d
i
n
t
e
r
f
a
c
e
s
 
a
l
i
g
n
e
d
 
w
i
t
h
 
U
L
P
 
a
n
d
 
a
p
p
l
i
c
a
t
i
o
n
n
e
e
d
s
 
f
o
r
 
h
i
g
h
-
p
e
r
f
o
r
m
a
n
c
e
 
f
a
b
r
i
c
 
s
e
r
v
i
c
e
s
Software leading hardware
Enable future hardware features
Minimal impact to applications
Minimize impedance match between ULPs and
network APIs
Craft optimal APIs
Detailed analysis on MPI, SHMEM, and other PGAS
languages
Focus on other applications – storage, databases, …
 
5
 
Call for Participation
 
OFI WG is open participation
Contact the ofiwg mail list for meeting details
ofiwg@lists.openfabrics.org
Source code available through github
github.com/ofiwg
Presentations / meeting minutes available from
OFA download directory
 
6
 
Enable..
 
7
Optimized software path
to hardware
Independent of hardware
interface, version, features
Reduced cache and
memory footprint
Scalable address
resolution and storage
Tight data structures
Analyze application needs
Implement them in a coherent,
concise, high-performance manner
More agile development
Time-boxed, iterative development
Application focused APIs
Adaptable
 
Verbs Semantic Mismatch
 
8
 
Allocate WR
Allocate SGE
Format SGE – 3 writes
Format WR – 6 writes
 
 
Loop 1
 
Checks – 9 branches
 
Loop 2
   
Check
   
Loop 3
 
Checks – 3 branches
Checks – 3 branches
 
Direct call – 3 writes
 
 
Checks – 2 branches
 
C
u
r
r
e
n
t
 
R
D
M
A
 
A
P
I
s
 
E
v
o
l
v
e
d
 
F
a
b
r
i
c
 
I
n
t
e
r
f
a
c
e
s
 
Application-Centric Interfaces
 
Collect application requirements
Identify common, fast path usage models
Too many use cases to optimize them all
Build primitives around 
fabric services
Not device specific interface
 
9
R
e
d
u
c
i
n
g
 
i
n
s
t
r
u
c
t
i
o
n
 
c
o
u
n
t
 
r
e
q
u
i
r
e
s
 
a
b
e
t
t
e
r
 
a
p
p
l
i
c
a
t
i
o
n
 
i
m
p
e
d
a
n
c
e
 
m
a
t
c
h
 
OFA Software Evolution
 
10
libfabric
FI Provider
IB Verbs
Verbs
Provider
 
Verbs
 
Fabric Interfaces
RDMA CM
 
CM
 
uVerbs
Command
 
Fabric Interfaces Framework
 
Take growth into consideration
Reduce effort to incorporate new 
application
features
Addition of new interfaces, structures, or fields
Modification of existing functions
Allow time to design new interfaces correctly
Support prototyping interfaces prior to integration
 
www.openfabrics.org
 
11
F
o
c
u
s
 
o
n
 
l
o
n
g
e
r
-
l
i
v
e
d
 
i
n
t
e
r
f
a
c
e
s
 
s
o
f
t
w
a
r
e
 
l
e
a
d
i
n
g
 
h
a
r
d
w
a
r
e
 
Fabric Interfaces
 
12
Fabric Interfaces
Message
Queue
Control
Interface
RMA
Atomics
Addressing
Services
Tag
Matching
Triggered
Operations
CM Services
Fabric Provider Implementation
Message
Queue
CM Services
RMA
Tag
Matching
Control
Interface
Addressing
Services
Atomics
Triggered
Operations
 
Fabric Interfaces
 
Defines philosophy for interfaces and extensions
Focus interfaces on the 
semantics
 and 
services
offered by the hardware and not the hardware
implementation
Exports
 a minimal API
Control interface
Defines 
fabric interfaces
API sets for specific functionality
Defines core object model
Object-oriented design, but C-interfaces
 
13
 
Fabric Interfaces Architecture
 
Based on object-oriented
programming concepts
Derived objects define
interfaces
New interfaces exposed
Define behavior of inherited
interfaces
Optimize implementation
 
www.openfabrics.org
 
14
 
Control Interfaces
 
www.openfabrics.org
 
15
 
Application Semantics
 
Progress
Application or hardware driven
Data versus control interfaces
Ordering
Message ordering
Data delivery order
Multi-threading and locking model
Compile and run-time options
 
www.openfabrics.org
 
16
G
e
t
 
/
 
s
e
t
 
u
s
i
n
g
 
c
o
n
t
r
o
l
 
i
n
t
e
r
f
a
c
e
s
Fabric
 
Fabric Object Model
 
www.openfabrics.org
 
17
 
Fabric Interfaces
NIC
Physical Fabric
NIC
NIC
Resource Domain
Resource Domain
Event
Queue
Event
Counter
Active
Endpoint
Address
Vectors
Event
Queue
Event
Counter
Active
Endpoint
Address
Vectors
Passive
Endpoint
Event
Queue
Boundary of
resource sharing
Provider abstracts
multiple NICs
Software objects
usable across multiple
HW providers
 
Endpoint Interfaces
 
www.openfabrics.org
 
18
 
Properties
 
Interfaces
 
Application Configured Interfaces
 
19
 
NIC
 
inline send
 
send
Provider directs
app to best API sets
App specifies
comm model
 
Event Queues
 
www.openfabrics.org
 
20
 
Properties
 
Interface Details
 
Event Queues
 
www.openfabrics.org
 
21
Generic
completion
Op context
Send: +4-6 writes, +2 branches
Recv: +10-13 writes, +4 branches
+1 write, +0 branches
App selects
completion structure
Generic verbs
completion example
Application optimized
completion
 
Address Vectors
 
22
 
Example only
 
Summary
 
www.openfabrics.org
 
23
 
These concepts are
necessary
, not revolutionary
Communication addressing,
optimized data transfers, app-
centric interfaces, future looking
Want a solution where the
pieces fit tightly together
 
Repeated Call for Participation
 
Co-chair (
sean.hefty@intel.com
)
Meets Tuesdays from 9-10 PST / 12-1 EST
Links
Mailing list subscription
http://lists.openfabrics.org/mailman/listinfo/ofiwg
Document downloads
https://www.openfabrics.org/downloads/OFIWG/
libfabric source tree
www.github.com/ofiwg/libfabric
labfabric sample programs
www.github.com/ofiwg/fabtests
 
24
 
Backup
 
 
www.openfabrics.org
 
25
 
Verbs API Mismatch
 
26
 
struct ibv_sge {
 
uint64_t
  
addr;
 
uint32_t
  
length;
 
uint32_t
  
lkey;
};
 
struct ibv_send_wr {
 
uint64_t
   
wr_id;
 
struct ibv_send_wr *next;
 
struct ibv_sge
 
    *sg_list;
 
int
     
num_sge;
 
enum ibv_wr_opcode
 
opcode;
 
int
     
send_flags;
 
uint32_t
   
imm_data;
 
...
};
 
Verbs Provider Mismatch
 
27
 
For each work request
 
Check for available queue space
 
Check SGL size
 
Check valid opcode
 
Check flags x 2
 
Check specific opcode
 
Switch on QP type
  
Switch on opcode
 
Check flags
  
For each SGE
   
Check size
   
Loop over length
 
Check flags
 
Check
 
Check for last request
Other checks x 3
 
Verbs Completions Mismatch
 
28
 
struct ibv_wc {
 
uint64_t
  
wr_id;
 
enum ibv_wc_status
 
status;
 
enum ibv_wc_opcode
 
opcode;
 
uint32_t
  
vendor_err;
 
uint32_t
  
byte_len;
 
uint32_t
  
imm_data;
 
uint32_t
  
qp_num;
 
uint32_t
  
src_qp;
 
int
    
wc_flags;
 
uint16_t
  
pkey_index;
 
uint16_t
  
slid;
 
uint8_t
   
sl;
 
uint8_t
   
dlid_path_bits;
};
 
RDMA CM Mismatch
 
29
 
struct rdma_route {
 
struct rdma_addr        addr;
 
struct ibv_sa_path_rec *path_rec;
 
...
};
 
struct rdma_cm_id {...};
 
rdma_create_id()
rdma_resolve_addr()
rdma_resolve_route()
rdma_connect()
RDMA interfaces expose:
 
Progress
 
Ability of the underlying implementation to
complete processing of an asynchronous
request
N
e
e
d
 
t
o
 
c
o
n
s
i
d
e
r
 
A
L
L
 
a
s
y
n
c
h
r
o
n
o
u
s
 
r
e
q
u
e
s
t
s
Connections, address resolution, data transfers,
event processing, completions, etc.
HW/SW mix
 
www.openfabrics.org
 
30
All(?) current solutions require
significant software components
 
Progress
 
Support two progress models
Automatic and implicit
Separate operations as belonging to one of two
progress domains
Data or control
Report progress model for each domain
 
www.openfabrics.org
 
31
 
Automatic Progress
 
Implies hardware offload model
Or standard kernel services / threads for control
operations
Once an operation is initiated, it will complete
without further user intervention or calls into the
API
Automatic progress meets implicit model by
definition
 
www.openfabrics.org
 
32
 
Implicit Progress
 
Implies significant software component
Occurs when reading or waiting 
on EQ
(s)
Application can use separate EQs for control
and data
Progress limited to objects associated with
selected EQ(s)
App can request automatic progress
E.g. app wants to wait on native wait object
Implies provider allocated threading
 
www.openfabrics.org
 
33
 
Ordering
 
Applies to a single initiator endpoint performing
data transfers to one target endpoint over the
same data flow
Data flow may be a conceptual QoS level or path
through the network
Separate ordering domains
Completions, message, data
Fenced ordering may be obtained using fi_sync
operation
 
www.openfabrics.org
 
34
 
Completion Ordering
 
Order in which operation completions are
reported relative to their submission
Unordered or ordered
No defined requirement for ordered completions
Default: unordered
 
www.openfabrics.org
 
35
 
Message Ordering
 
Order in which message (transport) headers are
processed
I.e. whether transport message are received in or out
of order
Determined by selection of ordering bits
[Read | Write | Send]   After   [Read | Write | Send]
RAR, RAW, RAS, WAR, WAW, WAS, SAR, SAW, SAS
Example:
fi_order = 0  // unordered
fi_order = RAR | RAW | RAS | WAW | WAS |
   
    SAW | SAS     // IB/iWarp like ordering
 
www.openfabrics.org
 
36
 
Data Ordering
 
Delivery order of transport data into target
memory
Ordering per byte-addressable location
I.e. access to the same byte in memory
Ordering constrained by message ordering rules
Must at least have message ordering first
 
www.openfabrics.org
 
37
 
Data Ordering
 
Ordering limited to message order size
E.g. MTU
In order data delivery if transfer <= message order size
WAW, RAW, WAR sizes?
Message order size = 0
No data ordering
Message order size = -1
All data ordered
 
www.openfabrics.org
 
38
 
Other Ordering Rules
 
Ordering to different target endpoints not defined
Per message ordering semantics implemented
using different data flows
Data flows may be less flexible,  but easier to
optimize for
Endpoint aliases may be configured to use different
data flows
 
www.openfabrics.org
 
39
 
Multi-threading and Locking
 
Support both thread safe and lockless models
Compile time and run time support
Run-time limited to compiled support
Lockless (based on MPI model)
Single – single-threaded app
Funneled – only 1 thread calls into interfaces
Serialized – only 1 thread at a time calls into interfaces
Thread safe
Multiple – multi-threaded app, with no restrictions
 
www.openfabrics.org
 
40
 
Buffering
 
Support both application and network buffering
Zero-copy for high-performance
Network buffering for ease of use
Buffering in local memory or NIC
In some case, buffered transfers may be higher-
performing (e.g. “inline”)
Registration option for local NIC access
Migration to fabric managed registration
Required registration for remote access
Specify permissions
 
www.openfabrics.org
 
41
 
Scalable Transfer Interfaces
 
Application
 optimized code paths based on
usage model
Optimize call(s) for single work request
Single data buffer
Still support more complex WR lists/SGL
Per endpoint send/receive operations
Separate RMA function calls
Pre-configure data transfer flags
Known before post request
Select software path through provider
 
42
Slide Note
Embed
Share

Evolution of OpenFabrics Interfaces Architecture aims to align software interfaces with application requirements in the realm of high-performance computing (HPC). With a focus on supporting multiple interface semantics, remaining fabric and vendor agnostic, and adaptable in upstream Linux, the initiative seeks to address the evolving landscape of HPC hardware and fabric features. By leveraging the open source community, engaging with active developers and vendors, optimizing software APIs for available hardware features, and supporting multiple product generations, the project drives towards creating a more efficient and agile framework for high-performance fabric services. The OFIWG Charter outlines the development of an extensible, open-source framework and interfaces that cater to ULP and application needs for high-performance fabric services while minimizing impedance between ULPs and network APIs. The invitation for participation signals an open engagement for interested parties to contribute towards driving software design aligned with workload requirements.

  • OpenFabrics
  • Interfaces Architecture
  • High-performance Computing
  • HPC
  • Software Design

Uploaded on Oct 08, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Open Fabrics Interfaces Architecture Introduction Sean Hefty Intel Corporation

  2. Current State of Affairs OFED software Widely adopted low-level RDMA API Ships with upstream Linux but OFED SW was not designed around HPC Hardware and fabric features are changing Divergence is driving competing APIs Interfaces are being extended, and new APIs introduced Long delay in adoption Size of clusters and single node core counts greatly increasing More applications are wanting to take advantage of high- performance fabrics 2

  3. Solution Evolve OpenFabrics Design software interfaces that are aligned with application requirements Target needs of HPC Support multiple interface semantics Fabric and vendor agnostic Supportable in upstream Linux 3

  4. Enabling through OpenFabrics Leveraging existing open source community Broad ecosystem Application developers and vendors Active community engagement Drive high-performance software APIs Take advantage of available hardware features Support multiple product generations Open Fabrics Interfaces Working Group 4

  5. OFIWG Charter Develop an extensible, open source framework and interfaces aligned with ULP and application needs for high-performancefabric services Software leading hardware Enable future hardware features Minimal impact to applications Minimize impedance match between ULPs and network APIs Craft optimal APIs Detailed analysis on MPI, SHMEM, and other PGAS languages Focus on other applications storage, databases, 5

  6. Call for Participation OFI WG is open participation Contact the ofiwg mail list for meeting details ofiwg@lists.openfabrics.org Source code available through github github.com/ofiwg Presentations / meeting minutes available from OFA download directory Help OFI WG understand workload requirements and drive software design 6

  7. Enable.. Scalability High performance Optimized software path to hardware Reduced cache and memory footprint Independent of hardware interface, version, features Scalable address resolution and storage Tight data structures Extensible App-centric More agile development Analyze application needs Time-boxed, iterative development Application focused APIs Adaptable Implement them in a coherent, concise, high-performance manner 7

  8. Verbs Semantic Mismatch Current RDMA APIs Evolved Fabric Interfaces 50-60 lines of C-code 25-30 lines of C-code Reduce setup cost - Tighter data Allocate WR Allocate SGE Format SGE 3 writes Format WR 6 writes Direct call 3 writes optimized send call Checks 2 branches generic send call Eliminate loops and branches - Remaining branches predictable Loop 1 Checks 3 branches Checks 9 branches Loop 2 Check Loop 3 Checks 3 branches Selective optimization paths to HW - Manual function expansion 8

  9. Application-Centric Interfaces Reducing instruction count requires a better application impedance match Collect application requirements Identify common, fast path usage models Too many use cases to optimize them all Build primitives around fabric services Not device specific interface 9

  10. OFA Software Evolution Verbs CM Fabric Interfaces libfabric RDMA CM IB Verbs FI Provider uVerbs Command Verbs Provider Transition from disjoint APIs to a cohesive set of fabric interfaces 10

  11. Fabric Interfaces Framework Focus on longer-lived interfaces software leading hardware Take growth into consideration Reduce effort to incorporate new application features Addition of new interfaces, structures, or fields Modification of existing functions Allow time to design new interfaces correctly Support prototyping interfaces prior to integration 11 www.openfabrics.org

  12. Fabric Interfaces Framework defines multiple interfaces Fabric Interfaces Control Interface Message Queue Atomics RMA Triggered Operations Addressing Services Tag CM Services Matching Fabric Provider Implementation Control Interface Message Queue Atomics RMA Triggered Operations Addressing Services Tag CM Services Matching Vendors provide optimized implementations 12

  13. Fabric Interfaces Defines philosophy for interfaces and extensions Focus interfaces on the semantics and services offered by the hardware and not the hardware implementation Exports a minimal API Control interface Defines fabric interfaces API sets for specific functionality Defines core object model Object-oriented design, but C-interfaces 13

  14. Fabric Interfaces Architecture Based on object-oriented programming concepts Derived objects define interfaces New interfaces exposed Define behavior of inherited interfaces Optimize implementation 14 www.openfabrics.org

  15. Control Interfaces fi_getinfo Application specifies desired functionality Discover fabric providers and services Identify resources and addressing fi_getinfo fi_fabric fi_fabric FI Framework Open a set of fabric interfaces and resources fi_register fi_register Dynamic providers publish control interfaces 15 www.openfabrics.org

  16. Application Semantics Get / set using control interfaces Progress Application or hardware driven Data versus control interfaces Ordering Message ordering Data delivery order Multi-threading and locking model Compile and run-time options 16 www.openfabrics.org

  17. Fabric Object Model Software objects usable across multiple HW providers Fabric Interfaces Boundary of resource sharing Fabric Event Queue Passive Endpoint Resource Domain Event Queue Resource Domain Event Queue Active Endpoint Active Endpoint Address Vectors Event Counter Address Vectors Event Counter Provider abstracts multiple NICs NIC NIC NIC Physical Fabric 17 www.openfabrics.org

  18. Endpoint Interfaces Communication interfaces Interfaces Properties CM Type Message Transfers RMA Tagged Atomics Triggered Endpoint Capabilities Protocol Software path to hardware optimized based on endpoint properties Message Transfers RMA Tagged Atomics Triggered Aliasing supports multiple software paths to hardware 18 www.openfabrics.org

  19. Application Configured Interfaces Communication type App specifies comm model Provider directs app to best API sets Capabilities Endpoint Data transfer flags sm. msg lg. msg RMA write send inline send read RMA Ops Message Queue Ops NIC 19

  20. Event Queues Asynchronous event reporting Interface Details Context only Data Tagged Generic Properties Compact optimized data structures Format Event Queues Domain None fd mwait Wait Object User specified wait object Optimize interface around reporting successful operations Event counters support lightweight event reporting 20 www.openfabrics.org

  21. Event Queues App selects completion structure Generic completion Op context read CQ optimized CQ +1 write, +0 branches Application optimized completion Send: +4-6 writes, +2 branches Recv: +10-13 writes, +4 branches Generic verbs completion example 21 www.openfabrics.org

  22. Address Vectors Fabric specific addressing requirements Store addresses/host names - Insert range of addresses with single call Share between processes Example only Reference entries by handle or index - Handle may be encoded fabric address Reference vector for group communication Start Range End Range Base LID SL host10 host1000 50 1 host1001 host4999 2000 2 Enable provider optimization techniques - Greatly reduce storage requirements 22

  23. Summary These concepts are necessary, not revolutionary Communication addressing, optimized data transfers, app- centric interfaces, future looking Want a solution where the pieces fit tightly together 23 www.openfabrics.org

  24. Repeated Call for Participation Co-chair (sean.hefty@intel.com) Meets Tuesdays from 9-10 PST / 12-1 EST Links Mailing list subscription http://lists.openfabrics.org/mailman/listinfo/ofiwg Document downloads https://www.openfabrics.org/downloads/OFIWG/ libfabric source tree www.github.com/ofiwg/libfabric labfabric sample programs www.github.com/ofiwg/fabtests 24

  25. Backup 25 www.openfabrics.org

  26. Verbs API Mismatch Significant SW overhead Application request struct ibv_sge { uint64_t uint32_t uint32_t }; <buffer, length, context> addr; length; lkey; 3 x 8 = 24 bytes of data needed SGE + WR = 88 bytes allocated struct ibv_send_wr { uint64_t struct ibv_send_wr *next; struct ibv_sge *sg_list; int enum ibv_wr_opcode opcode; int uint32_t ... }; wr_id; Requests may be linked - next must be set to NULL num_sge; Must link to separate SGL and initialize count imm_data; send_flags; App must set and provider must switch on opcode Must clear flags 28 additional bytes initialized 26

  27. Verbs Provider Mismatch Most often 1 (overlap operations) For each work request Check for available queue space Check SGL size Check valid opcode Check flags x 2 Check specific opcode Switch on QP type Switch on opcode Check flags For each SGE Check size Loop over length Check flags Check Check for last request Other checks x 3 Often 1 or 2 (fixed in source) Artifact of API QP type usually fixed in source Flags may be fixed or app may have taken branches 19+ branches including loops 100+ lines of C code 50-60 lines of code to HW 27

  28. Verbs Completions Mismatch Application accessed fields struct ibv_wc { uint64_t enum ibv_wc_status status; enum ibv_wc_opcode opcode; uint32_t uint32_t uint32_t uint32_t uint32_t int uint16_t uint16_t uint8_t uint8_t }; wr_id; App must check both return code and status to determine if a request completed successfully vendor_err; byte_len; imm_data; qp_num; src_qp; wc_flags; pkey_index; slid; sl; dlid_path_bits; Provider must fill out all fields, even those ignored by the app Provider must handle all types of completions from any QP Developer must determine if fields apply to their QP Single structure is 48 bytes likely to cross cacheline boundary 28

  29. RDMA CM Mismatch Want: reliable data transfers, zero copies to thousands of processes RDMA interfaces expose: struct rdma_route { struct rdma_addr addr; struct ibv_sa_path_rec *path_rec; ... }; Src/dst addresses stored per endpoint 456 bytes per endpoint Path record per endpoint struct rdma_cm_id {...}; rdma_create_id() rdma_resolve_addr() rdma_resolve_route() rdma_connect() Resolve single address and path at a time All to all connected model for best performance 29

  30. Progress Ability of the underlying implementation to complete processing of an asynchronous request Need to consider ALL asynchronous requests Connections, address resolution, data transfers, event processing, completions, etc. HW/SW mix All(?) current solutions require significant software components 30 www.openfabrics.org

  31. Progress Support two progress models Automatic and implicit Separate operations as belonging to one of two progress domains Data or control Report progress model for each domain SAMPLE Implicit Automatic Data Software Hardware offload Control Software Kernel services 31 www.openfabrics.org

  32. Automatic Progress Implies hardware offload model Or standard kernel services / threads for control operations Once an operation is initiated, it will complete without further user intervention or calls into the API Automatic progress meets implicit model by definition 32 www.openfabrics.org

  33. Implicit Progress Implies significant software component Occurs when reading or waiting on EQ(s) Application can use separate EQs for control and data Progress limited to objects associated with selected EQ(s) App can request automatic progress E.g. app wants to wait on native wait object Implies provider allocated threading 33 www.openfabrics.org

  34. Ordering Applies to a single initiator endpoint performing data transfers to one target endpoint over the same data flow Data flow may be a conceptual QoS level or path through the network Separate ordering domains Completions, message, data Fenced ordering may be obtained using fi_sync operation 34 www.openfabrics.org

  35. Completion Ordering Order in which operation completions are reported relative to their submission Unordered or ordered No defined requirement for ordered completions Default: unordered 35 www.openfabrics.org

  36. Message Ordering Order in which message (transport) headers are processed I.e. whether transport message are received in or out of order Determined by selection of ordering bits [Read | Write | Send] After [Read | Write | Send] RAR, RAW, RAS, WAR, WAW, WAS, SAR, SAW, SAS Example: fi_order = 0 // unordered fi_order = RAR | RAW | RAS | WAW | WAS | SAW | SAS // IB/iWarp like ordering 36 www.openfabrics.org

  37. Data Ordering Delivery order of transport data into target memory Ordering per byte-addressable location I.e. access to the same byte in memory Ordering constrained by message ordering rules Must at least have message ordering first 37 www.openfabrics.org

  38. Data Ordering Ordering limited to message order size E.g. MTU In order data delivery if transfer <= message order size WAW, RAW, WAR sizes? Message order size = 0 No data ordering Message order size = -1 All data ordered 38 www.openfabrics.org

  39. Other Ordering Rules Ordering to different target endpoints not defined Per message ordering semantics implemented using different data flows Data flows may be less flexible, but easier to optimize for Endpoint aliases may be configured to use different data flows 39 www.openfabrics.org

  40. Multi-threading and Locking Support both thread safe and lockless models Compile time and run time support Run-time limited to compiled support Lockless (based on MPI model) Single single-threaded app Funneled only 1 thread calls into interfaces Serialized only 1 thread at a time calls into interfaces Thread safe Multiple multi-threaded app, with no restrictions 40 www.openfabrics.org

  41. Buffering Support both application and network buffering Zero-copy for high-performance Network buffering for ease of use Buffering in local memory or NIC In some case, buffered transfers may be higher- performing (e.g. inline ) Registration option for local NIC access Migration to fabric managed registration Required registration for remote access Specify permissions 41 www.openfabrics.org

  42. Scalable Transfer Interfaces Application optimized code paths based on usage model Optimize call(s) for single work request Single data buffer Still support more complex WR lists/SGL Per endpoint send/receive operations Separate RMA function calls Pre-configure data transfer flags Known before post request Select software path through provider 42

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#