Efficient Data Race Detection Using Transactional Memory

undefined
Tong Zhang
, Dongyoon Lee, Changhee Jung
Computer Science Department
TxRace: Efficient Data Race Detection
Using Commodity Hardware
Transactional Memory
Data Races in Multithreaded Programs
Two threads access the same shared location (one is write)
Not ordered by synchronizations
 
Thread 1
 
Thread 2
 
MySQL bug #3596
Race Conditions Caused Severe Problems
 
• 50+ million people lost power
• Cost an estimated $6 billion
 
• About 30 million shares’ worth
   of trading were affected
• Cost an estimated $13 million
State-Of-The-Art Dynamic Data Race Detector
 
Software based solutions
FastTrack 
[PLDI’09]
Intel Inspector XE
Google Thread Sanitizer
...
 
Hardware based solutions
ReEnact 
[ISCA’03]
CORD 
[HPCA’06]
SigRace 
[ISCA’09]
 
Sound (no false negatives)
 
Complete (no false positives)
 
High overhead (10-100x)
 
Low overhead
 
Custom hardware
 
 
 
 
 
Our Approach
 
Hybrid SW + (existing) HW solution
 
Leverage the data conflict detection mechanism of
Hardware Transactional Memory
 (HTM) in commodity
processors for lightweight data race detection
 
Low overhead
 
No custom hardware
 
 
Outline
Motivation
Background: Transactional Memory
TxRace: Design and Implementation
Experiments
Conclusion
Transactional Memory (TM)
Allow a group of instructions (a transaction) to execute
in an 
atomic
 manner
 
Read(X)
 
Write(X)
Abort
 
Read(X)
Challenge 1: Unable to Pinpoint Racy Instructions
 
When a transaction gets aborted, we know that there was
a data conflict between transactions
 
 
 
 
 
 
 
 
However, we 
DO NOT
 know WHY and WHERE
- e.g. which instruction? at which address? Which
transaction caused the conflict?
 
Read(X)
 
Write(X)
Abort
 
?
Challenge 2: False Conflicts 
False Positives
 
Read(
X
)
 
Write(
Y
)
Abort
HTM detects data conflicts at the cache-line granularity
    → False positives
 
False transaction abort
without data race
Challenge 3. Non-conflict Aborts
Best-effort (non-ideal) HTM with limitations
     → Transaction may get aborted without data conflicts
     → False negatives (if ignored)
 
Read(X)
 
Write(Y)
 
Read(Z)
 
.
.
.
X
Y
Z
Abort
 
Hardware Buffer
 
“Capacity” Abort
 
Read(X)
 
Write(Y)
 
I/O syscall()
Abort
 
“Unknown” Abort
Outline
Motivation
Background: Transactional Memory
TxRace: Design and Implementation
Experiments
Conclusion
Potential data
 races
TxRace: 
Two-phase Data Race Detection
 
Fast-path
(HTM-based)
Slow-path
(SW-based)
 
 Fast
 Unable to pinpoint races
 False sharing(false positive)
 Non-conflict aborts(false
negative)
 
 Sound(no false negative)
C
o
m
p
l
e
t
e
(
n
o
 
f
a
l
s
e
 
p
o
s
i
t
i
v
e
)
 
Slow
 
Intel Haswell
(RTM)
 
Google ThreadSanitizer
(TSan)
 
 
 
 
 
 
 
Compile-time Instrumentation
 
Fast-path: convert sync-free regions into transactions
Slow-path: add Google TSan checks
 
Lock()
 
Unlock()
 
Thread1
 
Sync-free
 
Sync-free
 
Lock()
 
Thread2
 
Unlock()
 
X=1
 
X=2
Fast-path HTM-based Detection
 
Leverage HW-based data conflict detection in HTM
Problem
: On conflict, one transaction gets aborted, but all
others just proceed → slow-path missed racy transactions
 
X=1
 
X=2
Abort
Already passed
Fast-path HTM-based Detection
Leverage HW-based data conflict detection in HTM
Problem
: On conflict, one transaction gets aborted, but all
others just proceed → Cannot switch to slow-path
Solution
: Abort in-flight transactions 
artificially
 
X=1
 
X=2
 
R(TxFail
)
 
R(TxFail
)
 
R(TxFail
)
 
W(TxFail
)
Abort
Abort
Rollback all
Abort
Slow-path SW-based Detection
Use SW-based sound and complete data race detection
-
Pinpoint racy instructions
-
Filter out false positives (due to false sharing)
-
Handle non-conflict (e.g., capacity) aborts conservatively
 
X=1
 
X=2
Abort
Abort
SW-based
detection
Abort
Implementation
 
Two-phase data race detection
Fast-path: Intel’s Haswell Processor
Slow-path: Google’s Thread Sanitizer
 
Instrumentation
LLVM compiler framework
Compile-time & profile-guided optimizations
 
Evaluation
PARSEC benchmark suites with simlarge input
Apache web server with 300K requests from 20 clients
4 worker threads (4 hardware transactions)
Outline
Motivation
Background: Hardware Transactional Memory
TxRace: Design and Implementation
Experiments
1) Performance
2) Soundness (Detection capability)
3) Cost-effectiveness
Conclusion
1. Performance Overhead
 
11.68x
 
4.65x
 
>10x reduction
2. Soundness (Detection Capability)
 
False Negative
Recall:0.95
False Negatives
Due to non-overlapped transactions
X=1
X=2
False Negatives Case Study in 
vips
Repeat the experiment to exploit different interleaving
3. Cost-effectiveness Compared to Sampling
TxRace vs. Tsan with Sampling
Overhead equivalent
to naïve sampling at
25.5%
Recall compared to sampling
TxRace: Less overhead
 + High recall
Spend 25.5%
Get 47.2%
Conclusion
TxRace
HTM-based fast-path(most of the time)
SW-based slow-path(on-demand)
 
Recall: 0.95
 
11.68x -> 4.65x
Q&A
Thank you!
Performance overhead
 
1.16
Large number of
short transactions
Transaction
o
verhead
is low
Performance overhead
 
2.73
Performance overhead
 
Performance overhead
 
False Negative
 
X=1
 
X=2
Abort
 
W(TxFail)
Transactions finish and escape before being artifically
aborted
Slide Note

Good afternoon, my name is Tong, I come from VirginiaTech. Today I’m going to talk about our paper: TxRace, Efficient data race detection using commodity hardware transactional memory. This work is done supervised by my advisor Dongyoon Lee and Changhee Jung.

Embed
Share

This presentation discusses data race detection in multithreaded programs, exploring the impact of race conditions and the state-of-the-art dynamic data race detector solutions. It introduces a hybrid software and hardware approach leveraging Hardware Transactional Memory for lightweight data race detection, with a focus on low overhead and no custom hardware requirement. Transactions in Transactional Memory, challenges in pinpointing racy instructions, and solutions like FastTrack, Intel Inspector XE, Google Thread Sanitizer, ReEnact, CORD, and SigRace are also covered.

  • Data Race Detection
  • Transactional Memory
  • Multithreaded Programs
  • Race Conditions
  • Hardware Solutions

Uploaded on Jul 23, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. TxRace: Efficient Data Race Detection Using Commodity Hardware Transactional Memory Tong Zhang, Dongyoon Lee, Changhee Jung Computer Science Department - 1 -

  2. Data Races in Multithreaded Programs Two threads access the same shared location (one is write) Not ordered by synchronizations Thread 1 Thread 2 p = NULL if (p) { crash fput(p, ) } MySQL bug #3596 - 2 -

  3. Race Conditions Caused Severe Problems 50+ million people lost power Cost an estimated $6 billion Northeast Blackout of 2003 About 30 million shares worth of trading were affected Cost an estimated $13 million Stock Price Mismatch in 2012 - 3 -

  4. State-Of-The-Art Dynamic Data Race Detector Software based solutions FastTrack [PLDI 09] Intel Inspector XE Google Thread Sanitizer ... Sound (no false negatives) Complete (no false positives) High overhead (10-100x) Hardware based solutions ReEnact [ISCA 03] CORD [HPCA 06] SigRace [ISCA 09] Low overhead Custom hardware - 4 -

  5. Our Approach Hybrid SW + (existing) HW solution Leverage the data conflict detection mechanism of Hardware Transactional Memory (HTM) in commodity processors for lightweight data race detection Low overhead No custom hardware - 5 -

  6. Outline Motivation Background: Transactional Memory TxRace: Design and Implementation Experiments Conclusion - 6 -

  7. Transactional Memory (TM) Allow a group of instructions (a transaction) to execute in an atomic manner Thread1 Thread2 Read(X) Transaction begin time Transaction end Write(X) Abort Data conflict Rollback Read(X) - 7 -

  8. Challenge 1: Unable to Pinpoint Racy Instructions When a transaction gets aborted, we know that there was a data conflict between transactions Thread1 Thread2 ? Read(X) Write(X) Abort However, we DO NOT know WHY and WHERE - e.g. which instruction? at which address? Which transaction caused the conflict? - 8 -

  9. Challenge 2: False Conflicts False Positives HTM detects data conflicts at the cache-line granularity False positives Thread1 Thread2 False transaction abort without data race Read(X) Write(Y) Abort X Y Cache line - 9 -

  10. Challenge 3. Non-conflict Aborts Best-effort (non-ideal) HTM with limitations Transaction may get aborted without data conflicts False negatives (if ignored) Read(X) Write(Y) Read(Z) . . . Read(X) Write(Y) Z I/O syscall() Y X Abort Abort Hardware Buffer Unknown Abort Capacity Abort - 10 -

  11. Outline Motivation Background: Transactional Memory TxRace: Design and Implementation Experiments Conclusion - 11 -

  12. TxRace: Two-phase Data Race Detection Potential data races Slow-path (SW-based) Fast-path (HTM-based) Fast Sound(no false negative) Complete(no false positive) Slow Unable to pinpoint races False sharing(false positive) Non-conflict aborts(false negative) Google ThreadSanitizer (TSan) Intel Haswell (RTM) - 12 -

  13. Compile-time Instrumentation Fast-path: convert sync-free regions into transactions Slow-path: add Google TSan checks Thread2 Thread1 Lock() Transaction begin Transaction end X=2 Sync-free X=1 Lock() Unlock() Sync-free Unlock() - 13 -

  14. Fast-path Slow-path Fast-path HTM-based Detection Leverage HW-based data conflict detection in HTM Problem: On conflict, one transaction gets aborted, but all others just proceed slow-path missed racy transactions Thread1 Thread2 Thread3 Already passed X=1 Abort X=2 - 14 -

  15. Fast-path Slow-path Fast-path HTM-based Detection Leverage HW-based data conflict detection in HTM Problem: On conflict, one transaction gets aborted, but all others just proceed Cannot switch to slow-path Solution: Abort in-flight transactions artificially Thread3 Thread1 Thread2 Rollback all R(TxFail) R(TxFail) R(TxFail) X=1 X=2 Abort Abort Abort W(TxFail) - 15 -

  16. Fast-path Slow-path Slow-path SW-based Detection Use SW-based sound and complete data race detection - Pinpoint racy instructions - Filter out false positives (due to false sharing) - Handle non-conflict (e.g., capacity) aborts conservatively Thread3 Thread1 Thread2 SW-based detection X=1 X=2 Abort Abort Abort - 16 -

  17. Implementation Two-phase data race detection Fast-path: Intel s Haswell Processor Slow-path: Google s Thread Sanitizer Instrumentation LLVM compiler framework Compile-time & profile-guided optimizations Evaluation PARSEC benchmark suites with simlarge input Apache web server with 300K requests from 20 clients 4 worker threads (4 hardware transactions) - 17 -

  18. Outline Motivation Background: Hardware Transactional Memory TxRace: Design and Implementation Experiments 1) Performance 2) Soundness (Detection capability) 3) Cost-effectiveness Conclusion - 18 -

  19. 1. Performance Overhead TSan TxRace 30 63x 1195x Runtime Overhead 25 >10x reduction 20 15 11.68x 10 4.65x 5 0 - 19 -

  20. 2. Soundness (Detection Capability) Recall:0.95 TSan TxRace 10 Number of Race detected 79 64 64 112 False Negative 8 6 4 2 0 - 20 -

  21. False Negatives Due to non-overlapped transactions X=1 Transaction begin Transaction end time X=2 - 21 -

  22. False Negatives Case Study in vips Repeat the experiment to exploit different interleaving 120 # of detected data races 100 All detected 80 60 40 20 0 1 2 3 4 5 6 7 # of iterations - 22 -

  23. 3. Cost-effectiveness Compared to Sampling TxRace vs. Tsan with Sampling Bodytrack Overhead 1 Runtime Overhead 0.8 0.69 TxRace 0.6 Overhead equivalent to na ve sampling at 25.5% 0.4 25.5% 0.2 TSan+Sampling 0 0 20 40 60 80 100 Sampling Rate (%) - 23 -

  24. Recall compared to sampling TxRace: Less overhead + High recall Recall =???????? ???? ???? ????? ????? ???? ???? ????? Bodytrack Recall 1 0.8 0.75 TxRace Spend 25.5% Get 47.2% 0.6 Recall 0.4 47.2% 0.2 TSan+Sampling 0 0 20 40 60 80 100 Sampling Rate (%) - 24 -

  25. Conclusion TxRace HTM-based fast-path(most of the time) SW-based slow-path(on-demand) 11.68x -> 4.65x Performance TxRace TSan Completeness Soundness Recall: 0.95 - 25 -

  26. Q&A Thank you! - 26 -

  27. Performance overhead baseline xbegin/xend 12 11 10 Runtime Overhead 9 8 Transaction overhead is low 7 Large number of short transactions 6 5 4 3 2 1.16 1 0 - 27 -

  28. Performance overhead baseline xbegin/xend conflict aborts 12 11 10 Runtime Overhead 9 8 7 6 5 4 3 2.73 2 1 0 - 28 -

  29. Performance overhead baseline xbegin/xend conflict aborts capacity aborts 12 11 10 Runtime Overhead 9 8 7 6 5 4 3 2 1 0 - 29 -

  30. Performance overhead baseline xbegin/xend conflict aborts 63.3x capacity aborts unknown aborts 12 11 10 31.6x Runtime Overhead 9 8 7 6 5 4 3 2 1 0 - 30 -

  31. False Negative Transactions finish and escape before being artifically aborted T1 T2 T3 W(TxFail) R(TxFail) R(TxFail) R(TxFail) X=1 X=2 Abort - 31 -

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#