IP Fabric Architecture for GRNET Datacenters: Automating the Future

 
 
 
 
IP FABRIC 
ARCHITECTURE FOR
GRNET
AUTOMATING THE DATACENTER
 
June 13, 2018, TNC18: Trondheim
 
Christos Argyropoulos
cargious@noc.grnet.gr
 GRNET
 
The problem
 
2
 
Expanding from
Two small DCs
One larger one: 22 racks
To three new datacenters
Athens
: 36 racks (with the expansion)
Knossos: 26 racks
Louros: 14 racks
Network architecture?
Address existing problems
Balance between already tested and more innovative
solutions
Satisfy new requirement: VLAN stretch between
datacenters
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
 
Typical GRNET DC Rack
 
3
 
Virtual Machines Lane
 
Debian, KVM, Ganeti,
okeanos/vima
 
No routing protocols between
hosts & network. Simple linux
bridging or ARP proxying
 
Storage Lane
 
Distributed Object Storage:
Debian, Ceph/RADOS
 
Networking Lane
 
Also:
-
Traditional SAN/NAS
-
Baremetal servers or
Colocated third party
servers
-
Monitoring stations,
PDUs, TS, …
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
 
GRNET DC Network
 
4
 
Single switching
fabric across the
entire datacenter
VLANs stretching
DC router(s)
Intervlan routing
Routing with GRNET
IP core
Firewalling
Server Connectivity
Active/Active (LACP)
Active/Backup
Single homed
 
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
DC Network
GRNET Core
 
Previous Architectures: Ethernet + IP
 
5
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
Core
Router A
Core
Router B
Stacked
Switch
Stacked
Switch
DC Router
A
DC Router
B
 
Legacy
 Ethernet + IP
HW redundancy: two of
everything
First Hop Redundancy: VRRP
Redundant connections to the IP
Network
No Spanning Tree
Limitations
Servers are multihomed 
 no
LACP
Poor link utilization (no
active/active scenario)
BUM & Mac learning problems
due to the topology
InterDC VLAN stretch without
redundancy (L2 VPNs)
Mixed mode stacking not so
problem free
 
 
Previous Architectures: (closed) Fabric architectures
 
6
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
 
Limitations:
Complex implementations
often problematic in large
scale
Difficulty in debugging due
to ‘closed’ proprietary
solutions
Often platforms are
immature 
which results to
bugs
Eventually too many hours
wasted in troubleshooting
and bringing the solution to
‘production ready’ state.
All tested solutions already
outdated
Looking for something new
to avoid vendor ‘black box’
solutions
 
 
Link for multi-chassis
synchronization
 
IP Fabric (aka IP Clos): the recipe
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
 
7
 
Overlay Networking with EVPN as
control plane
In theory decouple the network
from the physical hardware 
programmatically provisioned in a
much larger scale
All_active physical topologies
Anycast layer 3 gateways
All traffic is L3 with: VXLAN
Dataplane encapsulation for
Overlay Tunnels
Limitations: network 
overhead
since all VM traffic is now
encapsulated with VXLAN header
(+64 bits)
No STP / no MC-LAG
 
Topology:
Build a decades old topology
IP Clos
Make use of existing hardware
 Juniper QFX5K as ToR switches
Add two new powerhouse Devices
for the spine layer
 Juniper QFX10K
 
VXLAN: Brief introduction
 
RFC 7348
Tunneling (overlay) protocol that
encapsulates all traffic in IP/UDP
Can be described as MAC-over-
UDP with a globally unique
identifier
VLAN-like separation,
according to VXLAN ID
Tunnels are build between
VXLAN Tunnel Endpoints (VTEPs)
Need of a control plane to
minimize flooding and better
facilitate learning
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
 
8
Overview
Payload
VXLAN VNID
Outer IP
 
Forward traffic
between IP Fabric
nodes referred to as
the underlay
 
Service Separation
 
All traffic is L3
no need for
STP
 
EVPN: Brief introduction
 
RFC 7432: BGP MPLS-Based
Ethernet VPN
Co-authored by Cisco,Juniper,
Alcatel-Lucent, Verizon, AT&T
Stated as evolution over existing
L2VPN and VPLS solutions
Can use both MPLS and VXLAN
as transport
Solves flood and learn problem
mentioned in VXLAN
Provides redundant (anycast)
gateways
Active / Active server
multihoming
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
 
9
Overview
 
Implemented as another BGP
address family (NLRI)
Introduces Route Types for
Ethernet Segment (ES) Auto
discovery
MAC/IP advertisement
MAC addresses are treated as
routable addresses and
advertised via BGP
BUM traffic and loop avoidance
PE devices in same ES auto
discovery (allows for active/active)
Traffic is sent to the appropriate
VTEP (no flooding)
Route filtering & route distribution
Key Concepts
 
GRNET IP Fabric topology
 
10
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
 
Spine & Leaf topology
 
SPINE: Juniper QFX1002
nx10G uplink to GRNET core
 
LEAF: Juniper QFX5100
2x40G uplink
 
Server:
2x1/10G UTP
Multihoming: In pairs of
racks
LACP or Active-Backup
 
 
 
Core
Router A
Core
Router B
Juniper
QFX5K
Juniper
QFX5K
Juniper
QFX10K
Juniper
QFX 10K
Juniper
QFX5K
Juniper
QFX5K
 
 
The Underlay Network
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
65491
65492
65401
65402
65403
65404
 
11
 
Each IP Fabric device acts as L3
eBGP between the devices:
Route distribution of Loopbacks
Multipath load balancing between
available paths
Loopbacks & Backbone links from
10.0.0.0/8
One Private AS per device
Unique assignments within
GRNET (for future inter-DC
connectivity)
Loopback IPs & ASN helps to
identify rack number
Loopbacks are used as VXLAN
VTEPs (Tunneling Endpoints)
 
 
 
 
 
The Overlay Network
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
65499
65499
65499
65499
65499
65499
 
12
 
iBGP mesh among all devices
Additional AS# for iBGP
Spines: route reflectors
EVPN address family (nlri)
EVPN: Advertise MACs (…)
Each PE advertises its local MACs
(per VXLAN)
L3 devices advertise MAC-IP
bindings (per VXLAN)
L3 (gateways)@Spines
Mostly because of limitations of
the leaves (QFX5100).
Distributed Gateway for
redundancy and performance
Server ports
Trunk or access
VLAN <-> VXLAN
LACP with two PE devices + loop
avoidance
 
 
 
 
L2 stretch between Datacenters
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
 
13
 
Customer data routing is done at
the Spines
Data Center interconnect using
‘Asymmetric routing’ 
 Egress PE
does only L2 lookup to the local
ethernet switching table that is
populated from the EVPN control-
plane
Spines are connected over eBGP
underlay to announce the VXLAN
termination points (IPs used for
the overlay network)
Spines are connected over iBGP
overlay to announce the MAC+IP
NLRIs (EVPN)
 
 
 
I
n
g
r
e
s
s
M
A
C
-
V
R
F
 
 
 
 
I
P
-
V
R
F
E
g
r
e
s
s
I
P
-
V
R
F
 
 
 
 
M
A
C
-
V
R
F
1
2
3
4
5
 
MAC REWRITE
 
Automation
 
Describe the topology with the addressing scheme in one YAML file
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
 
14
 
Ansible and IP Fabric
 
New roles (templates +
tasks) to our Ansible
playbooks
dcf-topology
dcf-service
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
 
Allow new tasks to be
executed and build
underlay/overlay
topologies
 
15
 
Introduce a new service
 
Build server interfaces, VLANs and Layer 3 redundant gateways via Ansible
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
 
16
 
Compete playbook run
 
17
 
Build one complete IP Fabric DC configuration and deploy L2 and L3
services in under 3 minutes!
In this example Ansible has produced configuration for 36 Leaves and 2 Spine
switches with 597 interfaces and 377 VLANs (!!!)
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
 
Reality: An adventure
 
Addressing scheme for
VTEP IPs
IN-band management -> loopbacks
Eventually a new carrier VRF to completely separate the management
traffic
Underlay ASNs
Multiply by number of DCs
 
Many limitations on the Broadcom chipset on QFX5100
Early adoption of EVPN implementation on QFX platforms
Bugs…
Easier troubleshooting due to openness of the solution
Lot of support/attention from Juniper (win-win case)
Netconf support from the beginning: ease of service deployment and
configuration changes with Ansible
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
 
18
 
Thank you!
 
13 Jun 2018, TNC18, Trondheim
 
IP Fabric solution for GRNET Datacenters
 
19
Slide Note
Embed
Share

GRNET is expanding its data center infrastructure to address existing problems, balance tested and innovative solutions, and enable VLAN stretching between data centers. The new architecture aims to enhance network efficiency and scalability by implementing an IP Fabric solution. Previous architectures faced limitations in redundancy, link utilization, and VLAN stretching, prompting the need for a more advanced approach. By adopting a single switching fabric and Active/Active connectivity, GRNET is streamlining data center operations and paving the way for automation.

  • GRNET
  • Datacenters
  • IP Fabric
  • Automation
  • Network Architecture

Uploaded on Sep 27, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. IP FABRIC ARCHITECTURE FOR GRNET AUTOMATING THE DATACENTER Christos Argyropoulos cargious@noc.grnet.gr GRNET June 13, 2018, TNC18: Trondheim

  2. The problem 2 Expanding from Two small DCs One larger one: 22 racks To three new datacenters Athens: 36 racks (with the expansion) Knossos: 26 racks Louros: 14 racks Network architecture? Address existing problems Balance between already tested and more innovative solutions Satisfy new requirement: VLAN stretch between datacenters IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  3. Typical GRNET DC Rack 3 Also: - - Storage Lane Traditional SAN/NAS Baremetal servers or Colocated third party servers Monitoring stations, PDUs, TS, Distributed Object Storage: Debian, Ceph/RADOS - Networking Lane Virtual Machines Lane Debian, KVM, Ganeti, okeanos/vima No routing protocols between hosts & network. Simple linux bridging or ARP proxying IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  4. GRNET DC Network 4 Single switching fabric across the entire datacenter VLANs stretching DC router(s) Intervlan routing Routing with GRNET IP core Firewalling Server Connectivity Active/Active (LACP) Active/Backup Single homed GRNET Core DC Network IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  5. Previous Architectures: Ethernet + IP 5 Legacy Ethernet + IP HW redundancy: two of everything First Hop Redundancy: VRRP Redundant connections to the IP Network No Spanning Tree Limitations Servers are multihomed no LACP Poor link utilization (no active/active scenario) BUM & Mac learning problems due to the topology InterDC VLAN stretch without redundancy (L2 VPNs) Mixed mode stacking not so problem free Core Router A Core Router B DC Router A DC Router B Stacked Switch Stacked Switch IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  6. Previous Architectures: (closed) Fabric architectures 6 Limitations: Complex implementations often problematic in large scale Difficulty in debugging due to closed proprietary solutions Often platforms are immature which results to bugs Eventually too many hours wasted in troubleshooting and bringing the solution to production ready state. All tested solutions already outdated Looking for something new to avoid vendor black box solutions Core Router A Core Router B Aggregat ion Switch Aggregat ion Switch Linecar d switch Linecar d switch LACP LACP Link for multi-chassis synchronization IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  7. IP Fabric (aka IP Clos): the recipe 7 Overlay Networking with EVPN as control plane In theory decouple the network from the physical hardware programmatically provisioned in a much larger scale All_active physical topologies Anycast layer 3 gateways All traffic is L3 with: VXLAN Dataplane encapsulation for Overlay Tunnels Limitations: network overhead since all VM traffic is now encapsulated with VXLAN header (+64 bits) No STP / no MC-LAG Topology: Build a decades old topology IP Clos Make use of existing hardware Juniper QFX5K as ToR switches Add two new powerhouse Devices for the spine layer Juniper QFX10K IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  8. VXLAN: Brief introduction 8 Overview All traffic is L3 no need for STP RFC 7348 Tunneling (overlay) protocol that encapsulates all traffic in IP/UDP Can be described as MAC-over- UDP with a globally unique identifier VLAN-like separation, according to VXLAN ID Tunnels are build between VXLAN Tunnel Endpoints (VTEPs) Need of a control plane to minimize flooding and better facilitate learning Forward traffic between IP Fabric nodes referred to as the underlay Payload Outer IP VXLAN VNID Service Separation IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  9. EVPN: Brief introduction 9 Key Concepts Overview Implemented as another BGP address family (NLRI) Introduces Route Types for Ethernet Segment (ES) Auto discovery MAC/IP advertisement MAC addresses are treated as routable addresses and advertised via BGP BUM traffic and loop avoidance PE devices in same ES auto discovery (allows for active/active) Traffic is sent to the appropriate VTEP (no flooding) Route filtering & route distribution RFC 7432: BGP MPLS-Based Ethernet VPN Co-authored by Cisco,Juniper, Alcatel-Lucent, Verizon, AT&T Stated as evolution over existing L2VPN and VPLS solutions Can use both MPLS and VXLAN as transport Solves flood and learn problem mentioned in VXLAN Provides redundant (anycast) gateways Active / Active server multihoming IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  10. GRNET IP Fabric topology 10 Spine & Leaf topology Core Router A Core Router B SPINE: Juniper QFX1002 nx10G uplink to GRNET core Juniper QFX10K Juniper QFX 10K LEAF: Juniper QFX5100 2x40G uplink Server: 2x1/10G UTP Multihoming: In pairs of racks LACP or Active-Backup Juniper QFX5K Juniper QFX5K Juniper QFX5K Juniper QFX5K IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  11. The Underlay Network 11 Each IP Fabric device acts as L3 eBGP between the devices: Route distribution of Loopbacks Multipath load balancing between available paths Loopbacks & Backbone links from 10.0.0.0/8 One Private AS per device Unique assignments within GRNET (for future inter-DC connectivity) Loopback IPs & ASN helps to identify rack number Loopbacks are used as VXLAN VTEPs (Tunneling Endpoints) 65491 65492 65401 65402 65403 65404 IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  12. The Overlay Network 12 iBGP mesh among all devices Additional AS# for iBGP Spines: route reflectors EVPN address family (nlri) EVPN: Advertise MACs ( ) Each PE advertises its local MACs (per VXLAN) L3 devices advertise MAC-IP bindings (per VXLAN) L3 (gateways)@Spines Mostly because of limitations of the leaves (QFX5100). Distributed Gateway for redundancy and performance Server ports Trunk or access VLAN <-> VXLAN LACP with two PE devices + loop avoidance 65499 65499 65499 65499 65499 65499 IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  13. L2 stretch between Datacenters 13 Customer data routing is done at the Spines Data Center interconnect using Asymmetric routing Egress PE does only L2 lookup to the local ethernet switching table that is populated from the EVPN control- plane Spines are connected over eBGP underlay to announce the VXLAN termination points (IPs used for the overlay network) Spines are connected over iBGP overlay to announce the MAC+IP NLRIs (EVPN) Ingress MAC-VRF IP-VRF Egress IP-VRF MAC-VRF 4 2 3 5 1 MAC REWRITE IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  14. Automation 14 Describe the topology with the addressing scheme in one YAML file IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  15. Ansible and IP Fabric 15 Allow new tasks to be executed and build underlay/overlay topologies New roles (templates + tasks) to our Ansible playbooks dcf-topology dcf-service IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  16. Introduce a new service 16 Build server interfaces, VLANs and Layer 3 redundant gateways via Ansible IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  17. Compete playbook run 17 Build one complete IP Fabric DC configuration and deploy L2 and L3 services in under 3 minutes! In this example Ansible has produced configuration for 36 Leaves and 2 Spine switches with 597 interfaces and 377 VLANs (!!!) IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  18. Reality: An adventure 18 Addressing scheme for VTEP IPs IN-band management -> loopbacks Eventually a new carrier VRF to completely separate the management traffic Underlay ASNs Multiply by number of DCs Many limitations on the Broadcom chipset on QFX5100 Early adoption of EVPN implementation on QFX platforms Bugs Easier troubleshooting due to openness of the solution Lot of support/attention from Juniper (win-win case) Netconf support from the beginning: ease of service deployment and configuration changes with Ansible IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

  19. Thank you! 19 IP Fabric solution for GRNET Datacenters 13 Jun 2018, TNC18, Trondheim

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#