Innovative NoX Router: Transforming Low-Latency Router Techniques
Discover the groundbreaking NoX Router developed by Mitchell Hayenga and Mikko Lipasti from the Department of ECE. This router introduces a non-speculative control technique, enhancing efficiency by encoding and controlling traffic with XOR properties. By eliminating arbitration latency and dead cycles, the NoX Router showcases competitive frequency and remarkable throughput improvements. Motivated by the evolution of on-chip networks, this router offers a fresh approach to switch arbitration techniques, bringing a new paradigm to high-performance architecture research.
Uploaded on Oct 06, 2024 | 0 Views
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
The NoX Router Mitchell Hayenga Mikko Lipasti Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE
Overview New low-latency router technique Don t arbitrate or speculate! Encode. Control XOR Property (A^B) ^ B = A Input Channel Hides arbitration latency Eliminates dead cycles Switch Fabric The NoX Router Single-cycle/wormhole/mesh implementation Frequency competitive with pure speculative 2.7%-34.4% better ED2on application traces Up to 9.9% better throughput on synthetic traffic The NoX Router, Micro 11 2/19
Motivation Intel Teraflops Router Virtual Channel Router Pipeline Evolution Modern On-Chip Networks BW RC VA SA ST LT Bandwidth Plentiful, Latency Critical BW NRC Control VA SA ST LT Complex, Speculative, Critical Path BW NRC VA SA ST LT Datapath VA NRC SA ST LT Fast, Simple, Wire-Dominated NoX Tradeoff Marginal increase in datapath complexity Hide control latency The NoX Router, Micro 11 3/19
Switch Arbitration Techniques Non-Speculative Control B Wins A Wins Arbitration occurs before switch traversal A A A A A ? Speculative Switch Traversal [Mullins ISCA 2004] Switch Fabric B B B Assume contention doesn t happen Wasted cycle in the event of contention Arbiter decides what gets sent on the next cycle cycle 0 1 2 3 4 clk port 0 port 1 grant valid out data out A A B p1 A B A p0 p0 A B A ??? No Contention Contention The NoX Router, Micro 11 4/19
Switch Arbitration Techniques Non-Speculative Control B Wins Arbitration occurs before switch traversal A A A A A A^B Speculative Switch Traversal [Mullins ISCA 2004] Switch Fabric B B Assume contention doesn t happen Wasted cycle in the event of contention Arbiter decides what gets sent on the next cycle cycle 0 1 2 3 4 Encoding clk port 0 port 1 grant valid out data out A A B p1 A Blindly transmit, XOR within switch fabric No contention - data sent unmodified p0 Contention - data sent XOR d A A B^A Arbiter decides what was sent No Contention Contention The NoX Router, Micro 11 5/19
Receive Logic Works upon simple XOR property. (A^B^C) ^ (B^C) = A Simple Decode Always able to decode by XORing two sequential values Maintains previous router s arbitration order/fairness Coded 0 1 Flit Buffer A^B^C 0 B^C A^B^C A B^C C B^C C A B The NoX Router, Micro 11 6/19
Tradeoffs and Scaling Arbitration Control O(log n) delay for most arbiters Input Channel Decode logic Switch Switch Fabric Fabric Constant with respect to # of ports Switch Fabric XOR delay scales slightly worse than a mux/tristate-based solution Maybe not an issue (control latency) The NoX Router, Micro 11 7/19
The NoX Router Network of XORs Implementation Details 8x8 Mesh, 2mm long 64-bit links Single Cycle (Router+Link) Wormhole Dimension ordered routing Minimally buffered The NoX Router, Micro 11 8/19
Baseline Designs Non-Speculative Serial arbitration & switch logic Long cycle time Efficient link utilization Speculative Techniques [Mullins ISCA 2004] Hides arbitration latency Potential for wasted link bandwidth Spec-Fast & Spec-Accurate [Mullins ASP-DAC 2006] The NoX Router, Micro 11 9/19
Frequency Analysis Overheads present in all designs 248ps SRAM delay 98ps link latency Architecture Clock Period Non-Speculative Spec-Fast Spec-Accurate NoX % - 33.3% 27.7% 21.1% 0.92 ns 0.69 ns 0.72 ns 0.76 ns The NoX Router, Micro 11 10/19
Synthetic Traffic - Latency bandwidth (MB/s/node) bandwidth (MB/s/node) The NoX Router, Micro 11 11/19
Synthetic Traffic ED2 bandwidth (MB/s/node) bandwidth (MB/s/node) The NoX Router, Micro 11 12/19
Application Traffic - Latency The NoX Router, Micro 11 13/19
Application Traffic ED2 The NoX Router, Micro 11 14/19
Power @ Fixed Bandwidth Traffic Pattern Decode negligible Uniform Random 2GB/s/node injection rate Spec-Fast saturated Switch/Link glitching in speculative Marginal additional decode power The NoX Router, Micro 11 15/19
Area Floorplanning Standard Router NoX Router ~17% More Area Decoding and Masking Port 0 64x4 SRAM Port 1 64x4 SRAM Port 2 64x4 SRAM Port 3 64x4 SRAM Port 4 64x4 SRAM Port 0 64x4 SRAM Port 1 64x4 SRAM Port 2 64x4 SRAM Port 3 64x4 SRAM Port 4 64x4 SRAM XOR Switch 161.2 m 161.2 m Crossbar 140 m 140 m 101.0 m 102.2 m 70 m 70 m 28 m The NoX Router, Micro 11 16/19
Going Further Input Speedup What if we could drive two values from Switch Fabric an input buffer in a single cycle Final decode step has 2 values available Last packet sees no additional delay from contention at the previous router Multi-hop encoded forwarding Flit Buffer Don t decode @ every hop, decode A^B when packets diverge B A Allow new collisions with the head flit Requires additional sideband info B The NoX Router, Micro 11 17/19
Conclusion New encoding-based low-latency router technique Hides arbitration latency Comparable frequency to speculative switch traversal techniques Eliminates wasted interconnect bandwidth Promising application to multiple router architectures The NoX Router, Micro 11 18/19
Thanks Questions? The NoX Router, Micro 11 19/19
Virtual Channels Future Work Physical Channels vs. Virtual Channels VC Router Benefits Dynamic bandwidth sharing (performance) VC Router Negatives Increased arbitration delay (performance) Increased buffer energy (power) Large unified crossbar (area, power) Possible but tradeoffs need to be re-evaluated Structuring of input buffers/decode logic VC credit accounting The NoX Router, Micro 11 20/19
Multi-Flit Support Current support is conservative Performs similarly to speculative routers if multi-flit packets collide Not all bad though ~70% of packets are single-flit coherence packets Only head-flit collisions matter Requests all single-flit Alternatives Fragment multi-flit packets Provide sufficient buffering space The NoX Router, Micro 11 21/19