Revisiting Complexity of Hardware Cache Coherence in Computer Science
Today's shared memory systems face increasing complexity in cache coherence protocol implementations, posing significant challenges in verification and optimization. This study re-evaluates the complexities of existing protocols like MESI and introduces an alternative approach called DeNovo, focusing on disciplined programming and model checking to enhance performance and reduce errors.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Department of Computer Sciences Revisiting the Complexity of Hardware Cache Coherence and Some Implications Rakesh Komuravelli Sarita Adve, Ching-Tsun Chou University of Illinois @ Urbana-Champaign, Intel denovo@cs.illinois.edu
Department of Computer Sciences Motivation Today s shared memory systems are more complex than ever Implementing cache coherence protocols is a major challenge Tens of transient states, hard to test races and add optimizations Have we tamed the protocol complexity yet? Verified a state-of-the-art implementation of MESI from GEMS Found six bugs even after 4+ years of usage worldwide Current verification techniques are insufficient Scalable but hard to use or error prone (e.g. parametric verification) Protocols designed for verification impact perf. (e.g., fractals) Are there any alternatives? 2
Department of Computer Sciences An alternative approach DeNovo: a h/w-s/w co-designed protocol [PACT 2011] Assumes disciplined programming eliminating data races Simple protocol, yet providing performance and power advantages Model checked DeNovo Found three bugs: easy to fix; implementation errors 15X fewer reachable states compared to MESI Focus of the talk: Understand what makes hardware protocols complex Experiences with verifying MESI and DeNovo 3
Department of Computer Sciences Outline Motivation Understanding hardware protocol complexity Verification model and findings Conclusions
Department of Computer Sciences Why are hardware protocols complex? Invalid Shared Exclusive Modified Text book protocol for MESI 5
Department of Computer Sciences Why are hardware protocols complex? Writek Writek Invalid Shared Readi Readi Writek [sharers exist] Writei Readk Readk Readi Writei [no sharers] Exclusive Modified Readi Writei Readi/ Writei Text book protocol for MESI In reality, the actual implementation is a lot more complex 6
Department of Computer Sciences Why are hardware protocols complex? Writek Writek Invalid Shared Readi Readi Writek [sharers exist] Writei Readk Readk Readi Writei [no sharers] Exclusive Modified Readi Writei Readi/ Writei Text book protocol for MESI In reality, the actual implementation is a lot more complex 7
Department of Computer Sciences Example transition for MESI Store L1P2 L1P1 L1Pn Initial state at L1 Invalid (I) Shared (S) Shared (S) Acks GETX Transient_1 Invalid (I) Invalid (I) Transient_2 On last Ack Invalidations Modified (M) Data Initial state at L2 LL2/Directory Shared (SS) Transient_3 L1Modified(MT) Exclusive_Unblock 8
Department of Computer Sciences Example transition for MESI Store L1P2 L1P1 L1Pn Initial state at L1 Invalid (I) Shared (S) Shared (S) Acks GETX Transient_1 Invalid (I) Invalid (I) Transient_2 On last Ack Invalidations Modified (M) Data Initial state at L2 LL2/Directory Shared (SS) Transient_3 L1Modified(MT) Exclusive_Unblock One transition requires three transient states (total 21) Transient states Hardware races Software data races 9
Department of Computer Sciences DeNovo with zero transient states Assumes data-race-free software Completely eliminates transient states from the protocol Exploits s/w information for simple coherence enforcement Invalidate stale copies in private caches Caches selectively self-invalidate entries not written by self No sharers list Track up-to-date copy Directory keeps track of one up-to-date copy Readi Readi Invalid Valid Writei Writei Writek Registered Readi, Writei Readk 10
Department of Computer Sciences Example transition for DeNovo Store L1P2 L1P1 L1Pn Initial state at L1 Invalid (I) Valid (V) Valid (V) Self- Invalidations Registered (R) Invalid (I) Invalid (I) Registration request Initial state at L2 Valid (V) Registered (R) LL2/Directory Zero transient states => a simplified protocol 11
Department of Computer Sciences Outline Motivation Understanding hardware protocol complexity Verification model and findings Conclusions
Department of Computer Sciences Verification model Mur model checking tool Verified DeNovo and MESI protocols State-of-the art GEMS implementation Abstract model Single address, two data values Two cores with private L1 and unified L2, unordered n/w Data-race-free assumption for DeNovo 13
Department of Computer Sciences Results Correctness Six bugs in MESI protocol Two deadlock scenarios Unhandled races due to L1 writebacks Several days to fix 14
Department of Computer Sciences A MESI bug L1P2 L1P1 Store Modified (M) Invalid (I) Replacement Transient_1 Transient_3 L1P1 PUTX DATA GETX Dangling message!! Invalid (I) Modified (M) Fwd_GETX Exclusive_Unblock ERROR!! L1Modified (MT) Modified at:P1 L1Modified (MT) Modified at:P2 Transient_2 LL2/Directory 2a Complex to identify the cause and fix Required adding multiple new state transitions 15
Department of Computer Sciences Results Correctness Six bugs in MESI protocol Two deadlock scenarios Unhandled races due to L1 writebacks Several days to fix and needed more transient states Three bugs in DeNovo protocol Mistakes in translation from high level specification Simple to fix Complexity 15x fewer reachable states for DeNovo 20x faster to verify for DeNovo DeNovo is simpler and needs reduced verification effort 16
Department of Computer Sciences Scalability results Extended the base model Two addresses instead of one DeNovo model finished without new bugs MESI model ran out of system memory (32GB) Need more scalable tools for non-experts 17
Department of Computer Sciences Conclusions Have we tamed the coherence protocol complexity yet? No! 6 bugs in a state-of-the-art MESI protocol in use for 4+ years Main source: transient states DeNovo: an alternative h/w-s/w co-designed approach 3 easy-to-fix bugs in an immature protocol Zero transient states MESI vs. DeNovo DeNovo has 15X fewer reachable states, 20X faster to verify Easy-to-use verification tools are not scalable Need better tools for non-experts 18