Fabric Interfaces Architecture Overview

 
Fabric Interfaces
Architecture
 
Sean Hefty - Intel Corporation
 
Changes
 
v2
Remove interface object
Add open interface as base object
Add SRQ object
Add EQ group object
 
www.openfabrics.org
 
2
 
Overview
 
Object Model
Do we have the right type of objects defines?
Do we have the correct object relationships?
Interface Synopsis
High-level description of object operations
Is functionality missing?
Are interfaces associated with the right object?
Architectural Semantics
Do the semantics match well with the apps?
What semantics are missing?
 
www.openfabrics.org
 
3
 
Object “Class” Model
 
Objects represent collection of attributes and
interfaces
I.e. object-oriented programming model
Consider architectural model only at this point
 
www.openfabrics.org
 
4
Objects do not necessarily map directly
to hardware or software objects
 
Conceptual Object Hierarchy
 
www.openfabrics.org
 
5
Object
“inheritance”
 
Object Relationships
 
www.openfabrics.org
 
6
Object “scope”
 
Fabric
 
Represents a communication
domain or boundary
Single IB or RoCE subnet, IP
(iWarp) network, Ethernet subnet
Multiple local NICs / ports
Topology data, network time
stamps
Determines native addressing
Mapped addressing possible
GID/LID versus IP
 
www.openfabrics.org
 
7
 
Passive (Fabric) EP
 
Listening endpoint
Connection-oriented protocols
Wildcard listen across
multiple NICs / ports
Bind to address to restrict
listen
Listen may migrate with
address
 
www.openfabrics.org
 
8
 
Fabric EQ
 
Associated with passive
endpoint(s)
Reports connection requests
Could be used to report fabric
events
 
www.openfabrics.org
 
9
 
Resource Domain
 
Boundary for resource
sharing
Physical or logical NIC
Command queue
Container for data transfer
resources
A provider may define
multiple domains for a single
NIC
Dependent on resource sharing
 
www.openfabrics.org
 
10
 
Domain Address Vectors
 
Maintains list of remote
endpoint addresses
Map – native addressing
Index – ‘rank’-based addressing
Resolves higher-level
addresses into fabric addresses
Native addressing abstracted
from user
Handles address and route
changes
 
www.openfabrics.org
 
11
 
Domain Endpoints
 
Data transfer portal
Send / receive queues
Command queues
Ring buffers
Multiple types defined
Connection-oriented /
connectionless
Reliable / unreliable
Message / stream
 
www.openfabrics.org
 
12
 
Domain Event Queues
 
Reports asynchronous events
Unexpected errors reported
‘out of band’
Events separated into ‘EQ
domains’
CM, AV, completions
1 EQ domain per EQ
Future support for merged EQ
domains
 
www.openfabrics.org
 
13
 
EQ Groups
 
Collection of EQs
Conceptually shares same
wait object
Grouping for progress and
wait operations
 
www.openfabrics.org
 
14
 
Shared Receive Queue
 
Shares buffers among
multiple endpoints
Not addressable
Addressable SRQs are
abstracted within the endpoint
 
www.openfabrics.org
 
15
 
Domain Counters
 
Provides a count of successful
completions of asynchronous
operations
Conceptual HW counter
Count is independent from an
actual event reported to the
user through an EQ
 
www.openfabrics.org
 
16
 
Domain Memory Regions
 
Memory ranges accessible by
fabric resources
Local and/or remote access
Defines permissions for
remote access
 
www.openfabrics.org
 
17
 
Interface Synopsis
 
Operations associated with identified ‘classes’
General functionality, versus detailed methods
The full set of methods are not defined here
Detailed behavior (e.g. blocking) is not defined
Identify missing and unneeded functionality
Mapping of functionality to objects
 
www.openfabrics.org
 
18
Use timeboxing to limit scope of
interfaces to refine by a target date
 
www.openfabrics.org
 
19
 
 
www.openfabrics.org
 
20
 
 
www.openfabrics.org
 
21
 
 
www.openfabrics.org
 
22
 
 
www.openfabrics.org
 
23
 
 
www.openfabrics.org
 
24
 
 
www.openfabrics.org
 
25
 
 
www.openfabrics.org
 
26
 
 
www.openfabrics.org
 
27
 
 
www.openfabrics.org
 
28
 
 
www.openfabrics.org
 
29
 
 
www.openfabrics.org
 
30
 
 
Architectural Semantics
 
Progress
Ordering - completions and data delivery
Multi-threading and locking model
Buffering
Function signatures and semantics
 
www.openfabrics.org
 
31
Once defined, object and interface semantics
cannot change – semantic changes require new
objects and interfaces
Need refining
 
Progress
 
Ability of the underlying implementation to
complete processing of an asynchronous
request
N
e
e
d
 
t
o
 
c
o
n
s
i
d
e
r
 
A
L
L
 
a
s
y
n
c
h
r
o
n
o
u
s
 
r
e
q
u
e
s
t
s
Connections, address resolution, data transfers,
event processing, completions, etc.
HW/SW mix
 
www.openfabrics.org
 
32
All(?) current solutions require
significant software components
 
Progress - Proposal
 
Support two progress models
Automatic and implicit (name?)
Separate operations as belonging to one of two
progress domains
Data or control
Report progress model for each domain
 
www.openfabrics.org
 
33
 
Progress - Proposal
 
Implicit progress
Occurs when reading or waiting 
on EQ
(s)
Application can use separate EQs for control and
data
Progress limited to objects associated with selected
EQ(s)
App can request automatic progress
E.g. app wants to wait on native wait object
Implies provider allocated threading
 
www.openfabrics.org
 
34
 
Ordering - Completions
 
Outbound
Is any ordering guarantee needed?
‘Sync’ call completion guarantees all selected,
previous operations issued on an endpoint have
completed
Inbound
Ordering only guaranteed for message queue posted
receives
 
www.openfabrics.org
 
35
 
Ordering – Data Delivery
 
Interfaces may imply specific ordering rules
E.g. Tagged – “If a sender sends two messages in
succession to the same destination, and both match
the same receive, then this operation cannot receive
the second message if the first one is still pending.  If
a receiver posts two receives in succession, and both
match the same message, then the second receive
operation cannot be satisfied by this message, if the
first one is still pending.”
 
www.openfabrics.org
 
36
 
Ordering – Data Delivery
 
Required ordering specified by application
[read | write | send] after [read | write | send]
RAR, RAW, WAR, WAW, SAW
Ordering may differ based on message size
E.g. size ≥ MTU
 
www.openfabrics.org
 
37
Needs more analysis with a
formal proposal
 
Multi-threading and Locking
 
Support both thread safe and lockless models
Lockless – based on MPI model
Single – single-threaded app
Funneled – only 1 thread calls into interfaces
Serialized – only 1 thread at a time calls into
interfaces
Are all models needed?
Thread safe
Multiple – multi-threaded app, with no restrictions
 
www.openfabrics.org
 
38
 
Buffering
 
Support both application and network buffering
Zero-copy for high-performance
Network buffering for ease of use
Buffering in local memory or NIC
In some case, buffered transfers may be higher-
performing (e.g. “inline”)
Registration option for local NIC access
Migration to fabric managed registration
Required registration for remote access
Specify permissions
 
www.openfabrics.org
 
39
Slide Note
Embed
Share

This detailed content delves into the fabric interfaces architecture presented by Sean Hefty at Intel Corporation. It covers changes in version 2, object models, architectural semantics, conceptual object hierarchy, object relationships, fabric representation, passive fabric endpoint functionalities, and more.

  • Fabric Interfaces
  • Architecture
  • Object Model
  • Object Relationships
  • Fabric Representation

Uploaded on Oct 08, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Fabric Interfaces Architecture Sean Hefty - Intel Corporation

  2. Changes v2 Remove interface object Add open interface as base object Add SRQ object Add EQ group object 2 www.openfabrics.org

  3. Overview Object Model Do we have the right type of objects defines? Do we have the correct object relationships? Interface Synopsis High-level description of object operations Is functionality missing? Are interfaces associated with the right object? Architectural Semantics Do the semantics match well with the apps? What semantics are missing? 3 www.openfabrics.org

  4. Object Class Model Objects represent collection of attributes and interfaces I.e. object-oriented programming model Consider architectural model only at this point Objects do not necessarily map directly to hardware or software objects 4 www.openfabrics.org

  5. Conceptual Object Hierarchy Fabric Domain Map Address Vector Index Passive Msg Active Endpoint Shared Receive Queue Datagram Descriptor Fabric RDM Completion CM Object inheritance Event Queue AV EQ Group Domain Counter Memory Region Interfaces 5 www.openfabrics.org

  6. Object Relationships Passive EP EQ CM Map AV Fabric Index Msg Active EP Datagram Shared RQ RDM Domain EQ Group CQ CM EQ AV Object scope Counter Domain MR 6 www.openfabrics.org

  7. Fabric Represents a communication domain or boundary Single IB or RoCE subnet, IP (iWarp) network, Ethernet subnet Multiple local NICs / ports Topology data, network time stamps Determines native addressing Mapped addressing possible GID/LID versus IP Passive EP Fabric EQ Domain 7 www.openfabrics.org

  8. Passive (Fabric) EP Listening endpoint Connection-oriented protocols Wildcard listen across multiple NICs / ports Bind to address to restrict listen Listen may migrate with address Passive EP Fabric EQ Domain 8 www.openfabrics.org

  9. Fabric EQ Associated with passive endpoint(s) Reports connection requests Could be used to report fabric events Passive EP Fabric EQ Domain 9 www.openfabrics.org

  10. Resource Domain Boundary for resource sharing Physical or logical NIC Command queue Container for data transfer resources A provider may define multiple domains for a single NIC Dependent on resource sharing Passive EP Fabric EQ Domain 10 www.openfabrics.org

  11. Domain Address Vectors Maintains list of remote endpoint addresses Map native addressing Index rank -based addressing Resolves higher-level addresses into fabric addresses Native addressing abstracted from user Handles address and route changes AV Active EP EQ Domain EQ Group SRQ Counter MR 11 www.openfabrics.org

  12. Domain Endpoints Data transfer portal Send / receive queues Command queues Ring buffers Multiple types defined Connection-oriented / connectionless Reliable / unreliable Message / stream AV Active EP EQ Domain EQ Group SRQ Counter MR 12 www.openfabrics.org

  13. Domain Event Queues Reports asynchronous events Unexpected errors reported out of band Events separated into EQ domains CM, AV, completions 1 EQ domain per EQ Future support for merged EQ domains AV Active EP EQ Domain EQ Group SRQ Counter MR 13 www.openfabrics.org

  14. EQ Groups Collection of EQs Conceptually shares same wait object Grouping for progress and wait operations AV Active EP EQ Domain EQ Group SRQ Counter MR 14 www.openfabrics.org

  15. Shared Receive Queue Shares buffers among multiple endpoints Not addressable Addressable SRQs are abstracted within the endpoint AV Active EP EQ Domain EQ Group SRQ Counter MR 15 www.openfabrics.org

  16. Domain Counters Provides a count of successful completions of asynchronous operations Conceptual HW counter Count is independent from an actual event reported to the user through an EQ AV Active EP EQ Domain EQ Group SRQ Counter MR 16 www.openfabrics.org

  17. Domain Memory Regions Memory ranges accessible by fabric resources Local and/or remote access Defines permissions for remote access AV Active EP EQ Domain EQ Group SRQ Counter MR 17 www.openfabrics.org

  18. Interface Synopsis Operations associated with identified classes General functionality, versus detailed methods The full set of methods are not defined here Detailed behavior (e.g. blocking) is not defined Identify missing and unneeded functionality Mapping of functionality to objects Use timeboxing to limit scope of interfaces to refine by a target date 18 www.openfabrics.org

  19. Base Class Close Bind Destroy / free object Create an association between two object instances Fencing operation that completes only after previously issued asynchronous operations have completed (~fcntl) set/get low-level object behavior Open provider extended interfaces Sync Control I/F Open 19 www.openfabrics.org

  20. Fabric Domain Endpoint Open a resource domain Create a listening EP for connection-oriented protocols Open an event queue for listening EP or reporting fabric events EQ Open 20 www.openfabrics.org

  21. Resource Domain Obtain domain specific attributes Create an address vector, event or completion counter, event queue, endpoint, shared receive queue, or EQ group Query Open AV, EQ, EP, SRQ, EQ Group MR Ops Register data buffers for access by fabric resources 21 www.openfabrics.org

  22. Address Vector Insert Remove Insert one or more addresses into the vector Remote one or more addresses from the vector Return a stored address Convert an address into a printable string Lookup Straddr 22 www.openfabrics.org

  23. Base EP Enable Cancel Getopt Setopt Enables an active EP for data transfers Cancel a pending asynchronous operation (~getsockopt) get protocol specific EP options (~setsockopt) set protocol specific EP options 23 www.openfabrics.org

  24. Passive EP Getname (~getsockname) return EP address Listen Start listening for connection requests Reject Reject a connection request 24 www.openfabrics.org

  25. Active EP CM Connection establishment ops, usable by connection-oriented and connectionless endpoints 2-sided message queue ops, to send and receive messages 1-sided RDMA read and write ops 2-sided matched message ops, to send and receive messages (conceptual merge of messages and RMA writes) 1-sided atomic ops Triggered Deferred operations initiated on a condition being met MSG RMA Tagged Atomic 25 www.openfabrics.org

  26. Shared Receive Queue Post buffer to receive data Receive 26 www.openfabrics.org

  27. Event Queue Read Retrieve a completion event, and optional source endpoint address data for received data transfers Retrieve event data about an operation that completed with an unexpected error Insert an event into the queue Directs the EQ to signal its wait object when a specified condition is met Converts error data associated with a completion into a printable string Read Err Write Reset Strerror 27 www.openfabrics.org

  28. EQ Group Poll Wait Check EQs for events Wait for an event on the EQ group 28 www.openfabrics.org

  29. Completion Counter Retrieve a counter s value Increment a counter Set / clear a counter s value Wait until a counter reaches a desired threshold Read Add Set Wait 29 www.openfabrics.org

  30. Memory Region Desc (~lkey) Optional local memory descriptor associated with a data buffer (~rkey) Protection key against access from remote data transfers Key 30 www.openfabrics.org

  31. Architectural Semantics Need refining Progress Ordering - completions and data delivery Multi-threading and locking model Buffering Function signatures and semantics Once defined, object and interface semantics cannot change semantic changes require new objects and interfaces 31 www.openfabrics.org

  32. Progress Ability of the underlying implementation to complete processing of an asynchronous request Need to consider ALL asynchronous requests Connections, address resolution, data transfers, event processing, completions, etc. HW/SW mix All(?) current solutions require significant software components 32 www.openfabrics.org

  33. Progress - Proposal Support two progress models Automatic and implicit (name?) Separate operations as belonging to one of two progress domains Data or control Report progress model for each domain 33 www.openfabrics.org

  34. Progress - Proposal Implicit progress Occurs when reading or waiting on EQ(s) Application can use separate EQs for control and data Progress limited to objects associated with selected EQ(s) App can request automatic progress E.g. app wants to wait on native wait object Implies provider allocated threading 34 www.openfabrics.org

  35. Ordering - Completions Outbound Is any ordering guarantee needed? Sync call completion guarantees all selected, previous operations issued on an endpoint have completed Inbound Ordering only guaranteed for message queue posted receives 35 www.openfabrics.org

  36. Ordering Data Delivery Interfaces may imply specific ordering rules E.g. Tagged If a sender sends two messages in succession to the same destination, and both match the same receive, then this operation cannot receive the second message if the first one is still pending. If a receiver posts two receives in succession, and both match the same message, then the second receive operation cannot be satisfied by this message, if the first one is still pending. 36 www.openfabrics.org

  37. Ordering Data Delivery Required ordering specified by application [read | write | send] after [read | write | send] RAR, RAW, WAR, WAW, SAW Ordering may differ based on message size E.g. size MTU Needs more analysis with a formal proposal 37 www.openfabrics.org

  38. Multi-threading and Locking Support both thread safe and lockless models Lockless based on MPI model Single single-threaded app Funneled only 1 thread calls into interfaces Serialized only 1 thread at a time calls into interfaces Are all models needed? Thread safe Multiple multi-threaded app, with no restrictions 38 www.openfabrics.org

  39. Buffering Support both application and network buffering Zero-copy for high-performance Network buffering for ease of use Buffering in local memory or NIC In some case, buffered transfers may be higher- performing (e.g. inline ) Registration option for local NIC access Migration to fabric managed registration Required registration for remote access Specify permissions 39 www.openfabrics.org

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#