Efficient Data Race Detection Using Transactional Memory

Slide Note
Embed
Share

This presentation discusses data race detection in multithreaded programs, exploring the impact of race conditions and the state-of-the-art dynamic data race detector solutions. It introduces a hybrid software and hardware approach leveraging Hardware Transactional Memory for lightweight data race detection, with a focus on low overhead and no custom hardware requirement. Transactions in Transactional Memory, challenges in pinpointing racy instructions, and solutions like FastTrack, Intel Inspector XE, Google Thread Sanitizer, ReEnact, CORD, and SigRace are also covered.


Uploaded on Jul 23, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. TxRace: Efficient Data Race Detection Using Commodity Hardware Transactional Memory Tong Zhang, Dongyoon Lee, Changhee Jung Computer Science Department - 1 -

  2. Data Races in Multithreaded Programs Two threads access the same shared location (one is write) Not ordered by synchronizations Thread 1 Thread 2 p = NULL if (p) { crash fput(p, ) } MySQL bug #3596 - 2 -

  3. Race Conditions Caused Severe Problems 50+ million people lost power Cost an estimated $6 billion Northeast Blackout of 2003 About 30 million shares worth of trading were affected Cost an estimated $13 million Stock Price Mismatch in 2012 - 3 -

  4. State-Of-The-Art Dynamic Data Race Detector Software based solutions FastTrack [PLDI 09] Intel Inspector XE Google Thread Sanitizer ... Sound (no false negatives) Complete (no false positives) High overhead (10-100x) Hardware based solutions ReEnact [ISCA 03] CORD [HPCA 06] SigRace [ISCA 09] Low overhead Custom hardware - 4 -

  5. Our Approach Hybrid SW + (existing) HW solution Leverage the data conflict detection mechanism of Hardware Transactional Memory (HTM) in commodity processors for lightweight data race detection Low overhead No custom hardware - 5 -

  6. Outline Motivation Background: Transactional Memory TxRace: Design and Implementation Experiments Conclusion - 6 -

  7. Transactional Memory (TM) Allow a group of instructions (a transaction) to execute in an atomic manner Thread1 Thread2 Read(X) Transaction begin time Transaction end Write(X) Abort Data conflict Rollback Read(X) - 7 -

  8. Challenge 1: Unable to Pinpoint Racy Instructions When a transaction gets aborted, we know that there was a data conflict between transactions Thread1 Thread2 ? Read(X) Write(X) Abort However, we DO NOT know WHY and WHERE - e.g. which instruction? at which address? Which transaction caused the conflict? - 8 -

  9. Challenge 2: False Conflicts False Positives HTM detects data conflicts at the cache-line granularity False positives Thread1 Thread2 False transaction abort without data race Read(X) Write(Y) Abort X Y Cache line - 9 -

  10. Challenge 3. Non-conflict Aborts Best-effort (non-ideal) HTM with limitations Transaction may get aborted without data conflicts False negatives (if ignored) Read(X) Write(Y) Read(Z) . . . Read(X) Write(Y) Z I/O syscall() Y X Abort Abort Hardware Buffer Unknown Abort Capacity Abort - 10 -

  11. Outline Motivation Background: Transactional Memory TxRace: Design and Implementation Experiments Conclusion - 11 -

  12. TxRace: Two-phase Data Race Detection Potential data races Slow-path (SW-based) Fast-path (HTM-based) Fast Sound(no false negative) Complete(no false positive) Slow Unable to pinpoint races False sharing(false positive) Non-conflict aborts(false negative) Google ThreadSanitizer (TSan) Intel Haswell (RTM) - 12 -

  13. Compile-time Instrumentation Fast-path: convert sync-free regions into transactions Slow-path: add Google TSan checks Thread2 Thread1 Lock() Transaction begin Transaction end X=2 Sync-free X=1 Lock() Unlock() Sync-free Unlock() - 13 -

  14. Fast-path Slow-path Fast-path HTM-based Detection Leverage HW-based data conflict detection in HTM Problem: On conflict, one transaction gets aborted, but all others just proceed slow-path missed racy transactions Thread1 Thread2 Thread3 Already passed X=1 Abort X=2 - 14 -

  15. Fast-path Slow-path Fast-path HTM-based Detection Leverage HW-based data conflict detection in HTM Problem: On conflict, one transaction gets aborted, but all others just proceed Cannot switch to slow-path Solution: Abort in-flight transactions artificially Thread3 Thread1 Thread2 Rollback all R(TxFail) R(TxFail) R(TxFail) X=1 X=2 Abort Abort Abort W(TxFail) - 15 -

  16. Fast-path Slow-path Slow-path SW-based Detection Use SW-based sound and complete data race detection - Pinpoint racy instructions - Filter out false positives (due to false sharing) - Handle non-conflict (e.g., capacity) aborts conservatively Thread3 Thread1 Thread2 SW-based detection X=1 X=2 Abort Abort Abort - 16 -

  17. Implementation Two-phase data race detection Fast-path: Intel s Haswell Processor Slow-path: Google s Thread Sanitizer Instrumentation LLVM compiler framework Compile-time & profile-guided optimizations Evaluation PARSEC benchmark suites with simlarge input Apache web server with 300K requests from 20 clients 4 worker threads (4 hardware transactions) - 17 -

  18. Outline Motivation Background: Hardware Transactional Memory TxRace: Design and Implementation Experiments 1) Performance 2) Soundness (Detection capability) 3) Cost-effectiveness Conclusion - 18 -

  19. 1. Performance Overhead TSan TxRace 30 63x 1195x Runtime Overhead 25 >10x reduction 20 15 11.68x 10 4.65x 5 0 - 19 -

  20. 2. Soundness (Detection Capability) Recall:0.95 TSan TxRace 10 Number of Race detected 79 64 64 112 False Negative 8 6 4 2 0 - 20 -

  21. False Negatives Due to non-overlapped transactions X=1 Transaction begin Transaction end time X=2 - 21 -

  22. False Negatives Case Study in vips Repeat the experiment to exploit different interleaving 120 # of detected data races 100 All detected 80 60 40 20 0 1 2 3 4 5 6 7 # of iterations - 22 -

  23. 3. Cost-effectiveness Compared to Sampling TxRace vs. Tsan with Sampling Bodytrack Overhead 1 Runtime Overhead 0.8 0.69 TxRace 0.6 Overhead equivalent to na ve sampling at 25.5% 0.4 25.5% 0.2 TSan+Sampling 0 0 20 40 60 80 100 Sampling Rate (%) - 23 -

  24. Recall compared to sampling TxRace: Less overhead + High recall Recall =???????? ???? ???? ????? ????? ???? ???? ????? Bodytrack Recall 1 0.8 0.75 TxRace Spend 25.5% Get 47.2% 0.6 Recall 0.4 47.2% 0.2 TSan+Sampling 0 0 20 40 60 80 100 Sampling Rate (%) - 24 -

  25. Conclusion TxRace HTM-based fast-path(most of the time) SW-based slow-path(on-demand) 11.68x -> 4.65x Performance TxRace TSan Completeness Soundness Recall: 0.95 - 25 -

  26. Q&A Thank you! - 26 -

  27. Performance overhead baseline xbegin/xend 12 11 10 Runtime Overhead 9 8 Transaction overhead is low 7 Large number of short transactions 6 5 4 3 2 1.16 1 0 - 27 -

  28. Performance overhead baseline xbegin/xend conflict aborts 12 11 10 Runtime Overhead 9 8 7 6 5 4 3 2.73 2 1 0 - 28 -

  29. Performance overhead baseline xbegin/xend conflict aborts capacity aborts 12 11 10 Runtime Overhead 9 8 7 6 5 4 3 2 1 0 - 29 -

  30. Performance overhead baseline xbegin/xend conflict aborts 63.3x capacity aborts unknown aborts 12 11 10 31.6x Runtime Overhead 9 8 7 6 5 4 3 2 1 0 - 30 -

  31. False Negative Transactions finish and escape before being artifically aborted T1 T2 T3 W(TxFail) R(TxFail) R(TxFail) R(TxFail) X=1 X=2 Abort - 31 -

Related