Design and Evaluation of Hierarchical Rings with Deflection Routing

Slide Note
Embed
Share

This research explores the implementation of Hierarchical Rings with Deflection (HiRD) routing as a solution to the performance and energy inefficiencies found in traditional hierarchical ring designs. HiRD guarantees livelock freedom and efficient delivery while simplifying the network structure by eliminating buffers at local routers. The study shows that HiRD provides higher performance and energy efficiency compared to standard hierarchical rings, offering a promising approach for scalable network architectures.


Uploaded on Sep 12, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, Kevin Chang, Greg Nazario, Reetuparna Das, Gabriel H. Loh, Onur Mutlu

  2. Executive Summary Executive Summary Rings do not scale Rings do not scale well as core count increases Traditional hierarchical ring designs are complex and energy inefficient and energy inefficient Complicated buffering and flow control Solution: Solution: Hierarchical Rings with Deflection (HiRD) Guarantees livelock livelock freedom and delivery freedom and delivery Eliminates all buffers Eliminates all buffers at local routers and most buffers at bridge routers HiRD provides higher performance and performance and energy energy- -efficiency than hierarchical rings efficiency than hierarchical rings HiRD is simpler than hierarchical rings simpler than hierarchical rings are complex 2

  3. Outline Outline Background and Motivation Background and Motivation Key Idea: Deflection Routing End-to-end Delivery Guarantees Our Solution: HiRD Results Conclusion 3

  4. Scaling Problems in a Ring Scaling Problems in a Ring NoC NoC As the number of cores grows: Lower performance More power 4

  5. Alternative: Hierarchical Designs Alternative: Hierarchical Designs Local Ring (Level 0) Global Ring (Level 1) Packets can reach far destination in fewer hops 5

  6. Single Ring vs. Hierarchical Rings Single Ring vs. Hierarchical Rings 1.2 System Performance 1 Normalized 0.8 Ring 64-bit Ring 128-bit Ring 256-bit Hring 0.6 0.4 0.2 0 4x4 8x8 Network Size A hierarchical design provides better performance as the network scales 6

  7. Complexity in Hierarchical Designs Complexity in Hierarchical Designs Complex buffering and flow control 7

  8. Single Ring vs. Hierarchical Rings Single Ring vs. Hierarchical Rings 1.4 1.2 Network Power Normalized 1 Ring 64-bit Ring 128-bit Ring 256-bit Hring 0.8 0.6 0.4 0.2 0 4x4 8x8 Network Size Design complexity increases power consumption 8

  9. Our Goal Our Goal Design a hierarchical ring that Design a hierarchical ring that has lower complexity has lower complexity without sacrificing performance without sacrificing performance 9

  10. Outline Outline Background and Motivation Key Idea: Deflection Routing Key Idea: Deflection Routing End-to-end Delivery Guarantees Our Solution: HiRD Results Conclusion 10

  11. Key Idea Key Idea Eliminate buffers Use deflection routing Simpler flow control 11

  12. Local Router Local Router Local West Local East Core Key functionality: Accept new flits Pass flits around the ring 12

  13. Eliminating Buffers in Local Routers Eliminating Buffers in Local Routers Local West Local East Core 13

  14. Eliminating Buffers in Local Routers Eliminating Buffers in Local Routers Flits can enter the ring if the output is available Ejector Local West Local East No Buffer No Buffer Simpler Crossbar Simpler Crossbar Core 14

  15. Deflection Routing Deflection Routing Ejector Local West Local East Deflected Core 15

  16. Bridge Router Bridge Router Global Ring West East Crossbar West East Local Ring 16

  17. Eliminating Buffers in Bridge Routers Eliminating Buffers in Bridge Routers Global Ring West East Crossbar West East Local Ring 17

  18. Eliminating Buffers in Bridge Routers Eliminating Buffers in Bridge Routers Global Ring West East Simpler Buffering Simpler Buffering Fewer Buffers Fewer Buffers Simpler Crossbar Simpler Crossbar West East Local Ring 18

  19. Outline Outline Background and Motivation Key Idea: Deflection Routing End End- -to to- -end Delivery Guarantees end Delivery Guarantees Our Solution: HiRD Results Conclusion 19

  20. Livelock Livelock in Deflection Routing in Deflection Routing Injection starvation Ring Unable to inject Starved Flit Src 20

  21. HiRD HiRD: Injection Guarantee : Injection Guarantee After 150 cycles: All nodes stop injecting flits Ring Ring Unable to inject Starved Flit Throttled Router Src Throttling provides injection guarantee injection guarantee 21

  22. Livelock Livelock in Deflection Routing in Deflection Routing Transfer starvation Ring Unable to Transfer Starved Flit Transfer FIFO 22

  23. HiRD HiRD: Transfer Guarantee : Transfer Guarantee After 10 looparounds Ring Starved Flit Reserved Slot Transfer FIFO Reservation provides transfer guarantee transfer guarantee 23

  24. Ejection Guarantee Ejection Guarantee Provided by a prior work Re-transmit once [Fallin et al., HPCA 11] Drop a flit if there is no available slot Reserve a buffer slot at the destination if a flit was dropped 24

  25. End End- -to to- -end Delivery Guarantees end Delivery Guarantees Ejection Guarantee dest Local Ring Local Ring Global Ring Transfer Guarantee Injection Guarantee Transfer Guarantee Injection Guarantee src Injection Guarantee 25

  26. Outline Outline Background and Motivation Key Idea: Deflection Routing End-to-end Delivery Guarantees Our Solution: Our Solution: HiRD HiRD Results Conclusion 26

  27. An Overview of An Overview of HiRD Deflection routing Simpler flow control Simpler flow control Simpler Simpler crossbars and control logics No buffers in the local rings Simpler and faster Simpler and faster local routers Simpler bridge routers Lower power, less area and simpler Lower power, less area and simpler to design Provides end-to-end delivery guarantees Injection guarantee by throttling Transfer guarantee by reservation HiRD to design 27

  28. Putting It All Together Putting It All Together Deflection routing Simpler flow control Simpler flow control Simpler Simpler crossbars and control logic No buffers in the local rings Simpler and faster Simpler and faster local routers Simpler bridge routers Lower power, less area and simpler to design Lower power, less area and simpler to design Provides end-to-end delivery guarantees Injection guarantee by throttling Transfer guarantee by reservation 28

  29. Outline Outline Background and Motivation Key Idea: Deflection Routing End-to-end Delivery Guarantees Our Solution: HiRD Results Results Conclusion 29

  30. Methodology Methodology Cores 16 and 64 OoO CPU cores 64 KB 4-way private L1 Distributed L2 Network 1 flit local-to-global buffer 4 flits global-to-local buffers 2-cycle per hop latency for local routers 3-cycle per hop latency for global routers 60 workloads consisting of SPEC2006 apps 30

  31. Comparison to Previous Designs Comparison to Previous Designs Single ring design Kim and Kim, NoCArc 09 64-bit links 128-bit links 256-bit links Buffered hierarchical ring design Ravindran and Stumm, HPCA 97 Identical topology Identical bisection bandwidth 4-flit buffers in both local and global routers 31

  32. Results: System Performance Results: System Performance 1.2 1.9% 2.9% System Performance 1 Ring 64-bit Ring 128-bit Ring 256-bit Hring HiRD Normalized 0.8 0.6 0.4 0.2 0 4x4 8x8 Network Size 1) 1) Hierarchical designs provide better performance than a Hierarchical designs provide better performance than a single ring on a larger network single ring on a larger network 2) 2) HiRD HiRD performs better compared to buffered hierarchical performs better compared to buffered hierarchical rings due to lower latency in local routers and throttling rings due to lower latency in local routers and throttling 32

  33. Results: Network Power Results: Network Power 1.4 Network Power 1.2 Normalized Ring 64-bit Ring 128-bit Ring 256-bit Hring HiRD 46.6% 15% 1 0.8 0.6 0.4 0.2 0 4x4 8x8 Network Size 1) 1) Hierarchical designs consume much less power than the Hierarchical designs consume much less power than the highest highest- -performance single ring performance single ring 2) Routers and flow control in 2) Routers and flow control in HiRD routers in buffered hierarchical rings routers in buffered hierarchical rings HiRD are simpler than are simpler than 33

  34. Router Area and Critical Path Router Area and Critical Path 16-node network with 8 bridge routers Verilog RTL design using 45nm Technology HiRD reduces reduces NoC to a buffered hierarchical ring design HiRD reduces local router critical path by reduces local router critical path by 29.9% 29.9% compared to a buffered hierarchical ring design NoC area by 50.3% area by 50.3% compared 34

  35. Additional Results Additional Results Detailed power breakdown Synthetic evaluations Energy efficiency results Worst case analysis Techical Report: Multithreaded evaluation Average, 90th percentile and max latency Comparison against other topologies Sensitivity analysis on different link bandwidths and number of buffers 35

  36. Outline Outline Background and Motivation Key Idea: Deflection Routing End-to-end Delivery Guarantees Our Solution: HiRD Results Conclusion Conclusion 36

  37. Conclusion Conclusion Rings do not scale Rings do not scale well as core count increases Traditional hierarchical ring designs are complex and energy inefficient and energy inefficient Complicated buffering and flow control Solution: Solution: Hierarchical Rings with Deflection (HiRD) Guarantees livelock livelock freedom and delivery freedom and delivery Eliminates all buffers Eliminates all buffers at local routers and most buffers at bridge routers HiRD provides higher performance and performance and energy energy- -efficiency than hierarchical rings efficiency than hierarchical rings HiRD is simpler than hierarchical rings simpler than hierarchical rings are complex 37

  38. Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, Kevin Chang, Greg Nazario, Reetuparna Das, Gabriel H. Loh, Onur Mutlu

  39. Backup Slides Backup Slides 39

  40. Network Intensive Workloads Network Intensive Workloads 15 network intensive workloads 40

  41. System Performance System Performance 1.2 System Performance 2.5% 5.6% 1 Ring 64-bit Ring 128-bit Ring 256-bit Hring HiRD Normalized 0.8 0.6 0.4 0.2 0 4x4 8x8 Network Size Deflections balance out the network load Thorttling reduces congestion 41

  42. Network Power Network Power 1.8 1.6 Network Power Normalized 1.4 Ring 64-bit Ring 128-bit Ring 256-bit Hring HiRD 1.2 37% 11.9% 1 0.8 0.6 0.4 0.2 0 4x4 8x8 Network Size More deflections happen when the network is congested 42

  43. Detailed Results Detailed Results 43

  44. Multithreaded Applications Multithreaded Applications 44

  45. Network Latency Network Latency 45

  46. Synthetic Traffic Evaluations Synthetic Traffic Evaluations 46

  47. Topology Comparison Topology Comparison 47

  48. Sweep over Different Bandwidth Sweep over Different Bandwidth 48

  49. Packet Reassembly Packet Reassembly Borrowed from CHIPPER [Fallin et al. HPCA 10] Retransmit-Once Destination node reserves a buffer slot for a dropped packet Provides ejection guarantee 49

  50. Other Optimizations Other Optimizations Map cores that communicate with each other a lot on the same local ring Takes advantage of the faster local ring routers 50

Related


More Related Content