Open-Source General Partitioning Multi-Tool for VLSI Physical Design

Slide Note

An open-source tool called TritonPart offers a constraints-driven approach for general partitioning in VLSI physical design. It replaces hMETIS and is integrated with OpenROAD, providing features like multi-way partitioning and embedding-aware techniques. TritonPart shows significant improvements over hMETIS in terms of performance metrics. The tool addresses modern VLSI partitioning challenges like multi-dimensional balance and timing-driven constraints.

calliope Follow

Uploaded on May 12, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

An Open-Source Constraints-Driven General Partitioning Multi-Tool for VLSI Physical Design Ismail Bustany[1], Grigor Gasparyan[1], Andrew B. Kahng, Ioannis Koutis[2], Bodhisatta Pramanik and Zhiang Wang University of California San Diego [1] Advanced Micro Devices [2] New Jersey Institute of Technology

Overview TritonPart: open-source replacement for hMETIS in all contexts Integrated with OpenROAD GitHub: https://github.com/The-OpenROAD-Project/OpenROAD/tree/master/src/par Key features Real-valued + multi-dimensional vertex weights and balance constraints e.g., satisfy multi-FPGA balance Community constraints: groups of vertices that stay together during partitioning e.g., keep macros and their direct fanins/fanouts together Multi-way partitioning Embedding-aware partitioning e.g., placement coordinates Timing-driven partitioning e.g., minimize cuts on critical paths Slack propagation methodology Key results Improvements over hMETIS up to ~20% on some benchmarks ~21X reduction of cuts on timing-critical paths compared to hMETIS and KaHyPar 2

Outline Background and motivation Hypergraph partitioning Modern VLSI partitioning TritonPart framework Constrained and Timing-driven clustering Initial partitioning Multilevel and Timing-driven refinement Experimental setup and results Conclusion and future directions 3

Hypergraph partitioning Given a hypergraph, divide the vertices into disjoint blocks Optimize a cost function Satisfy constraints Classical balanced min-cut problem! Properties Classical hypergraph partitioning Scalar General hypergraph partitioning Vector Vertex weights Scalar Vector Hyperedge weights Integer Real Weights type Balance Modern constraints Constraints Min-cut Any cost function Cost objective hMETIS, KaHyPar, SpecPart, BiPart Available partitioners TritonPart this work (!) 4

Modern VLSI partitioning Multi-dimensional balance Multi-dimensional vertex weights multi-dimensional block weights Satisfy balance for each dimension of block weight Fixed vertices Some vertices preassigned to certain blocks Grouping Vertices in the same group should end up in the same block Embedding guidance Vertices closer in the embedding are more likely to be in the same block Timing Minimize cuts on timing-critical paths, prevent timing-noncritical paths from becoming critical, etc. Not handled by the SOTA + we want an open research foundation ! 5

Outline Background and motivation Hypergraph partitioning Modern VLSI partitioning TritonPart framework Constrained and Timing-driven clustering Initial partitioning Multilevel and Timing-driven refinement Experimental setup and results Conclusion and future directions 6

TritonPart framework Timing info Hypergraph/netlist, imbalance factor, #blocks, cost function Primary inputs Fixed, grouping, timing, embedding OpenSTA Optional inputs First-Choice clustering Constraints-aware Embedding guidance Timing-driven Multilevel constrained clustering Random, VILE, ILP Multi-dimensional balance Initial partitioning K-way FM Pairwise K-way FM Greedy refinement (HER) Cut-overlay Slack propagation (timing) Enhanced multilevel refinement Output partition 7

Multilevel Constrained Clustering Enhanced Multilevel Refinement Initial Constrained clustering Partitioning Multilevel clustering paradigm (similar to hMETIS + enhancements) Section III.A First-Choice (FC) clustering Generate vertex ordering For each vertex ? in the order: Find neighbors of ? Cluster ? with neighbor ? having highest connectivity score ?,?? ? ? Connectivity score (?,?) = ? {? ? ? ? } ? ? : Set of hyperedges incident to a vertex ? Fixed vertex constraints All fixed vertices in same block are merged into single cluster Grouping constraints All vertices belonging to same group are merged into single cluster ??: Hyperedge weight vector ?: Scaling factors vector ? Embedding score (?,?) = ? Embedding guidance Leverage embedding score + connectivity score during FC clustering After clustering, set cluster coordinate to the center of gravity of all vertices in the cluster ?? ??? ??: Embedding of vertex ? ?: Normalization factor 8

Multilevel Constrained Clustering Enhanced Multilevel Refinement Initial Timing-driven clustering Partitioning ??? ? ? Timing score: ? {? ? ? ? } We obtain net slacks and critical paths from the OpenSTA timer Timing cost of a hyperedge: ??= ?(??? ?????,???????? ??? ? ?? ?) Overall score: connectivity score + embedding score + timing score We use this score for clustering in TritonPart 9

Multilevel Constrained Clustering Enhanced Multilevel Refinement Initial Initial partitioning Partitioning Partitioning of coarsest hypergraph in the multilevel hierarchy Section III.B of paper Random: Randomly assign vertices to blocks VILE: Generate imbalanced partitions (from [5]) ILP: Solve ILP instance of the partitioning problem Optimize cutsize only Run if #vertices < 50 10

Multilevel Constrained Clustering Enhanced Multilevel Refinement Initial Multilevel refinement Partitioning Uncoarsen and refine partition along the multilevel hierarchy Section III.C of paper K-way pairwise FM (PM): Pairwise-restricted K-way FM (from [9]) Direct K-way FM Greedy hyperedge refinement (HER): Move groups of vertices belonging to cut hyperedges Cut-overlay clustering (From SpecPart) Above methods are applied in sequence 11

Multilevel Constrained Clustering Enhanced Multilevel Refinement Initial Timing-driven refinement Partitioning Optimize cutsize + timing cost in multilevel refinement Timing cost of a partition: ? ? = ???+ (?????? ? + ???(?)) ? ???? Timing cost of cutting hyperedges ? ? Timing cost of cutting timing paths Snaking factors cost ??= ? ??? ?????,???????? ??? ? ?? ? , ?? ? = Total number of block re-entries for ? #cuts on ? - 1 ?,? and ? are scaling factors Problem: optimize ? ? optimize cuts on top ? timing-critical paths Cutting a timing-noncritical path multiple times may turn it critical ??= ? ??? ????? 12

Multilevel Constrained Clustering Enhanced Multilevel Refinement Initial Slack propagation Partitioning Idea: introduce delay for a cut hyperedge + update slacks of all paths affected by delay ????: Cut hyperedges, ??????: Slack of path ?, ??: timing cost of a path, ??: timing cost of a hyperedge slack propagation Find all paths traversing ? Introduce extra delay ??????= ?????? ???? Cut hyperedge ? ?? Propagation Update slack of all timing paths Forward propagation of delay Timing path ? Backward propagation of delay Update ??,?? Slack propagation methodology Perform slack propagation after every FM pass; we set to 20% of clock period 13

Outline Background and motivation Hypergraph partitioning Modern VLSI partitioning TritonPart framework Constrained and Timing-driven clustering Initial partitioning Multilevel and Timing-driven refinement Experimental setup and results Conclusion and future directions 14

Results for classical hypergraph partitioning Benchmarks: Titan23 Murray et al. 2013 Previous state-of-the-art: hMETIS, KaHyPar, SpecPart Two-way, three-way, four-way cutsize presented On average. 8% better than hMETIS (for all benchmarks) On average 4.5% better than hMETIS and 1% better than SpecPart (for all benchmarks) Four-way Two-way Three-way Titan23 hMETIS SpecPart TritonPart hMETIS TritonPart hMETIS TritonPart stap_qrd 377 4235 3334 1514 585 464 1833 3363 1562 585 501 4149 4550 1945 895 715 5169 6235 2605 1304 370 1483 3314 1492 582 467 2446 4498 1879 898 696 2813 5591 1860 1123 gsm_switch LU230 bitcoin_miner bitonic_mesh 2% imbalance Best of 20 runs (averaged over 50 samples) 15

Cutsizes with embedding guidance Generate two-dimensional embedding from SpecPart On average ~11%, ~2%, ~8% better than hMETIS, SpecPart and embedding- unaware TritonPart for presented benchmarks Results for SpecPart are from a single run Two-way cutsizes Titan23 hMETIS SpecPart TritonPart TritonPartemb 837 805 835 dart 784 497 418 454 denoise 416 914 997 877 sparcT1_chip2 876 603 515 526 directrf 493 2% imbalance Best of 20 runs (averaged over 50 samples) 16

Metrics for timing-driven partitioning Metric 1: Cuts on timing-critical paths Average #cuts on each timing critical path Max. #cuts on any timing critical path Metric 2: Cuts on timing-noncritical paths that have turned critical Total number of such paths Average number of cuts on each path Top 100000 paths extracted using OpenSTA timer Comparison baselines: hMETIS, KaHyPar, TritonPart with timing-driven feature disabled (timing-reweighted hypergraph as input) Benchmarks: Modern benchmarks from the MacroPlacement repo. E.g., Mempool Group has ~2.7M instances (361K FFs, 324 macros) 17

Results for timing-driven partitioning (contd) Delay for each cut: 20% of clock period Evaluation of cuts on timing-critical paths Worst cut Average cuts Design hM KHPr TPart TPartt hM KHPr TPart TPartt BlackParrot (GF12) 0.13 0.29 0.66 1 1 3 0.06 1 MemPool-G (GF12) 0.03 0.21 0.19 2 5 2 0.00 0 Evaluation of cuts on timing-noncritical paths Average cuts Number of paths that turn critical Design hM KHPr TPart TPartt hM KHPr Tpart Tpartt BlackParrot (GF12) 13954 15511 21752 1.38 1.09 1.40 786 1.01 MemPool-G (GF12) 2084 11553 6441 1.86 2.01 1.32 97 1.00 18

Outline Background and motivation Hypergraph partitioning Modern VLSI partitioning TritonPart framework Constrained and Timing-driven clustering Initial partitioning Multilevel and Timing-driven refinement Experimental setup and results Conclusion and future directions 19

Conclusion and future directions Conclusion: We proposed TritonPart, the first constraints-driven general partitioner for VLSI physical design 21stcentury features (constraints + timing + embedding) Novel slack propagation methodology All codes and scripts on OpenROAD https://github.com/The-OpenROAD- Project/OpenROAD/tree/master/src/par Future directions: Improvement of core hypergraph partitioning framework Usage of GNNs/HNNs Improvement of core timing-driven framework Usage of accurate delay budgeting Spatial arrangement-aware partitioning framework Extension to 3D partitioning Runtime speedup (currently ~2.4X hMETIS runtime) 20

Open-Source General Partitioning Multi-Tool for VLSI Physical Design

Download Presentation

Presentation Transcript

Related

More Related Content