Atomics and Parallelism in Programming

Atomics Variables: Not
radioactive, but you probably
still don’t want to touch them!
Roy Margalit
Parallelism for faster programs!
 
Races Prior To Atomics
Accessing the same memory location from 2 threads where one of
them writes is undefined behavior
Undefined behavior
Cache
Optimizations
Cache
Optimization (Single Thread)
for (int i=0; i<
ROWS
; ++i)
  for (int j=0; j<
COLS
; ++j)
     c[i][j] = a[i][j] + b[i][j];
for (int j=0; j<
COLS
; ++j)
  for (int i=0; i<
ROWS
; ++i)
    c[i][j] = a[i][j] + b[i][j];
Let’s see a live demo
Matrix summing with optimization from previous slide
Watch for sentinel value
 
Wait for value to change
Before
compiler
optimizations
After
compiler
optimizations
Memory Access Order
1
2
3
1
2
3
core.atomic To The Rescue
Defined behavior for race
Sequential Consistency
atomicLoad(foo);
atomicStore(foo);
Sequential Consistency (SC) Access
Can be considered in terms of interleaving the statements from
various threads.
Sequential Consistency Example (Store-Buffer)
x
 = 1;
a = 
y
;
x, y
 are global variables; a, b are local
y
 = 1;
b = 
x
;
x
 = 0;   
y
 = 0;
Sequential Consistency Example
x
 = 1;
a = 
y
;
x, y
 are global variables; a, b are local
y
 = 1;
b = 
x
;
Sequential Consistency Example
x
 = 1;
a = 
y
;
x, y
 are global variables; a, b are local
y
 = 1;
b = 
x
;
Sequential Consistency Example
x
 = 1;
a = 
y
;
x, y
 are global variables; a, b are local
y
 = 1;
b = 
x
;
Sequential Consistency Example
x
 = 1;
a = 
y
;
x, y
 are global variables; a, b are local
y
 = 1;
b = 
x
;
Sequential Consistency Example
x
 = 1;
a = 
y
;
x, y
 are global variables; a, b are local
y
 = 1;
b = 
x
;
Sequential Consistency Example
x
 = 1;
a = 
y
;
x, y
 are global variables; a, b are local
y
 = 1;
b = 
x
;
Sequential Consistency Example
x
 = 1;
a = 
y
;
x, y
 are global variables; a, b are local
y
 = 1;
b = 
x
;
Sequential Consistency Example
x
 = 1;
a = 
y
;
x, y
 are global variables; a, b are local
y
 = 1;
b = 
x
;
Sequential Consistency Example
x
 = 1;
a = 
y
;
x, y
 are global variables; a, b are local
y
 = 1;
b = 
x
;
Sequential Consistency Example
x
 = 1;
a = 
y
;
x, y
 are global variables; a, b are local
y
 = 1;
b = 
x
;
Sequential Consistency Example
x, y
 are global variables; a, b are local
x
 = 1;
a = 
y
;
y
 = 1;
b = 
x
;
x
 = 0;   
y
 = 0;
Lost Performance
SC forces cache flushes
SC prevents reordering of statements
C++ Memory Model
Weaker memory access that is cheaper
Races are still defined
Price to pay: Less synchronization
Relaxed
Cheapest atomic
No synchronization
Monotonic (Total order over
writes to 
x
)
Can be reordered between
different variables
x
 = 1;
x
 = 2;
a = 
x
;
assert(
x
 >= a);
x
 = 0;
Release / Acquire
Synchronizes threads.
Things before the 
release
happened before the 
acquire
.
x
 = 1;
y
 = 1;
a = 
y
; 
// 1
b = 
x
; 
// 1
x
 = 0;   
y
 = 0;
Store Buffer
Live Demo
a = 0   &&    b = 0
??
x
 = 1;
a = 
y
;
y
 = 1;
b = 
x
;
x
 = 0;   
y
 = 0;
Testing Is Not Enough
Testing on one architecture (x86) will not show issues with others
(Power-PC, ARM).
X86-TSO gives more guarantees than even Release/Acquire.
Message-Passing
Live Demo
a == 1  &&  b == 0
??
x
 = 1;
y
 = 1;
a = 
y
;
b = 
x
;
x
 = 0;   
y
 = 0;
Current compilers are afraid of optimizing atomics
A Short Story: 
Peterson’s algorithm in C++
In 1981, Peterson proposed a simple algorithm for
critical section in shared memory.
It assumes sequential consistent shared memory
(SC).
Q: How to implement Peterson’s algorithm in 
C/C++11
?
C++ atomics and memory ordering, blog post by 
Bartosz Milewski
https://bartoszmilewski.com/2008/12/01/c-atomics-and-memory-ordering/
A Short Story: 
Peterson’s algorithm in C++
A Short Story: 
Peterson’s algorithm in C++
A subsequent post by Anthony Williams analyzed both algorithms:
Bartosz’s implementation is indeed wrong.
Dmitriy’s implementation is correct.
https://www.justsoftwaresolutions.co.uk/threading/petersons_lock_with_C+
+0x_atomics.html
"Any time you deviate from SC, you increase the complexity of
the problem by orders of magnitude."
Fast forward to 2020
A solution in the future?
How does my research try to
solve this problem?
 
Verification
Verifying programs under weak memory models (WMM) is difficult
We’d like to verify programs assuming SC:
Goal
 
Automatically
 establish 
robustness
 of programs against WMM
 Our focus is on 
C/C++11 memory model
:
 
How does the C++ model describe behaviors?
x
 = 1;
y
 = 1;
a = 
y
;
b = 
x
;
x
 = 0;   
y
 = 0;
Example: “message-passing” litmus test
Initial state
final state
SC-
in
consistent
Linearization
:
SC-consistent
 
Proof idea:
Robustness against C11 is 
decidable
 and 
PSPACE-complete
.
Reduction to
 
reachability
 
under
 
an instrumented SC semantics
robustness
vio
l
ation!
W x 0
W y 0
W
 y 
1
W x 0
W y 0
R
 y 1
W
 y 
1
W x 1
Rocker
A static automated tool for checking robustness of simple programs.
Robustness is verified under SC
Results (Release/Acquire)
Results (Relaxed)
Thank you!
Rocker – https://github.com/rymrg/rocker/
Robustness against C11 is 
decidable
 and 
PSPACE-complete
.
You can try the tool without having to understand the memory model!
Slide Note
Embed
Share

Explore the world of atomics, parallelism, memory access optimizations, and sequential consistency in programming. Dive into concepts such as races in multithreading, cache optimizations, and the importance of memory access order before and after compiler optimizations. Witness live demos showcasing matrix summing optimization and learn about core atomic operations. Gain insights into how access can be considered in terms of interleaving statements from different threads, and understand examples of sequential consistency with store buffers and variable values.

  • Atomics
  • Parallelism
  • Memory Access
  • Optimization
  • Multithreading

Uploaded on Nov 19, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Atomics Variables: Not radioactive, but you probably still don t want to touch them! Roy Margalit

  2. Parallelism for faster programs!

  3. Races Prior To Atomics Accessing the same memory location from 2 threads where one of them writes is undefined behavior Undefined behavior Cache Optimizations

  4. Cache RAM Cache Cache CPU 2 CPU 1

  5. Optimization (Single Thread) for (int i=0; i<ROWS; ++i) for (int j=0; j<COLS; ++j) c[i][j] = a[i][j] + b[i][j]; for (int j=0; j<COLS; ++j) for (int i=0; i<ROWS; ++i) c[i][j] = a[i][j] + b[i][j];

  6. Lets see a live demo Matrix summing with optimization from previous slide Watch for sentinel value ?1,1 ??,1 ?1,? ??,? ?1,1 ??,1 ?1,? ??,? ?1,1 ??,1 ?1,? ??,? = + Wait for value to change

  7. Memory Access Order Before compiler optimizations 1 2 3 After compiler optimizations 3 2 1

  8. core.atomic To The Rescue Defined behavior for race Sequential Consistency atomicLoad(foo); atomicStore(foo);

  9. Sequential Consistency (SC) Access Can be considered in terms of interleaving the statements from various threads.

  10. Sequential Consistency Example (Store-Buffer) x = 0; y = 0; x = 1; a = y; y = 1; b = x; x, y are global variables; a, b are local

  11. Sequential Consistency Example Variable x y a b Value 0 0 0 0 x = 1; a = y; y = 1; b = x; x, y are global variables; a, b are local

  12. Sequential Consistency Example Variable x y a b Value 1 0 0 0 x = 1; a = y; y = 1; b = x; x, y are global variables; a, b are local

  13. Sequential Consistency Example Variable x y a b Value 1 0 0 0 x = 1; a = y; y = 1; b = x; x, y are global variables; a, b are local

  14. Sequential Consistency Example Variable x y a b Value 1 1 0 0 x = 1; a = y; y = 1; b = x; x, y are global variables; a, b are local

  15. Sequential Consistency Example Variable x y a b Value 1 1 0 1 x = 1; a = y; y = 1; b = x; x, y are global variables; a, b are local

  16. Sequential Consistency Example Variable x y a b Value 0 0 0 0 x = 1; a = y; y = 1; b = x; x, y are global variables; a, b are local

  17. Sequential Consistency Example Variable x y a b Value 1 0 0 0 x = 1; a = y; y = 1; b = x; x, y are global variables; a, b are local

  18. Sequential Consistency Example Variable x y a b Value 1 1 0 0 x = 1; a = y; y = 1; b = x; x, y are global variables; a, b are local

  19. Sequential Consistency Example Variable x y a b Value 1 1 1 0 x = 1; a = y; y = 1; b = x; x, y are global variables; a, b are local

  20. Sequential Consistency Example Variable x y a b Value 1 1 1 1 x = 1; a = y; y = 1; b = x; x, y are global variables; a, b are local

  21. Sequential Consistency Example a 0 0 1 1 b 0 1 0 1 Possible? No Yes Yes Yes x = 0; y = 0; x = 1; a = y; y = 1; b = x; x, y are global variables; a, b are local

  22. Lost Performance SC forces cache flushes SC prevents reordering of statements

  23. C++ Memory Model Weaker memory access that is cheaper Races are still defined Price to pay: Less synchronization

  24. Relaxed Cheapest atomic No synchronization Monotonic (Total order over writes to x) Can be reordered between different variables x = 0; x = 1; x = 2; a = x; assert(x >= a);

  25. Release / Acquire Synchronizes threads. Things before the release happened before the acquire. x = 0; y = 0; x = 1; y = 1; a = y; // 1 b = x; // 1

  26. Store Buffer x = 0; y = 0; x = 1; a = y; y = 1; b = x; a = 0 && b = 0 ?? Live Demo

  27. Testing Is Not Enough Testing on one architecture (x86) will not show issues with others (Power-PC, ARM). X86-TSO gives more guarantees than even Release/Acquire.

  28. Message-Passing x = 0; y = 0; x = 1; y = 1; a = y; b = x; a == 1 && b == 0 ?? Live Demo

  29. Current compilers are afraid of optimizing atomics

  30. A Short Story: Petersons algorithm in C++ In 1981, Peterson proposed a simple algorithm for critical section in shared memory. It assumes sequential consistent shared memory (SC). CPU CPU CPU Memory W x 0 W y 0 Q: How to implement Peterson s algorithm in C/C++11? W x 1 W y 1 R y 0 R x 0

  31. A Short Story: Petersons algorithm in C++ C++ atomics and memory ordering, blog post by Bartosz Milewski https://bartoszmilewski.com/2008/12/01/c-atomics-and-memory-ordering/

  32. A Short Story: Petersons algorithm in C++ A subsequent post by Anthony Williams analyzed both algorithms: Bartosz s implementation is indeed wrong. Dmitriy s implementation is correct. https://www.justsoftwaresolutions.co.uk/threading/petersons_lock_with_C+ +0x_atomics.html "Any time you deviate from SC, you increase the complexity of the problem by orders of magnitude."

  33. Fast forward to 2020 A solution in the future?

  34. How does my research try to solve this problem?

  35. Verification Verifying programs under weak memory models (WMM) is difficult We d like to verify programs assuming SC: verification under weak memory verification under sequential consistency = + ??? Robustness: ???? ??= ???? ???

  36. Goal verification under weak memory verification under sequential consistency = + robustness Robustness: ???? ??= ???? ??? Automatically establish robustness of programs against WMM Our focus is on C/C++11 memory model: ???? ??= ???? ?11

  37. How does the C++ model describe behaviors? W x 0 W y 0 x = 0; y = 0; x = 1; y = 1; a = y; b = x; W x 1 R y 1 W y 1 R x 0

  38. Example: message-passing litmus test a = y b = x a = y b = x x = 1 y = 1 a = y b = x y = 1 b = x a = 0 b = 0 a = 0 b = 0 a = 1 b = 0 a = 0 b = 0 a = 1 b = 0 W x 0 W y 0 W x 0 W y 0 W x 0 W y 0 W x 0 W y 0 W x 0 W y 0 W x 1 R y 1 W x 1 W x 1 R y 1 W x 1 W y 1 R x 0 W y 1 W y 1 Wx1 T1 Wy1 T1 Ry1 T2 Rx0 T2 (?0,?0) (?3,?3) (?1,?1) (?2,?2) (?4,?4) ?11 ?11 ?11 ?11 Initial state final state

  39. ?,?.(?0,?0) ?11 (?,?) ? is SC consistent SC-consistent W x 0 W y 0 Linearization: W x 1 R y 1 Init: W x 0 W y 0 T1: W x 1 W y 1 W y 1 T2: R y 1 W x 0 W y 0 SC-inconsistent W x 1 R y 1 W y 1 R x 0

  40. Robustness against C11 is decidable and PSPACE-complete. Reduction toreachability under an instrumented SC semantics Proof idea: a minimal robustness violation: SC-consistent SC-inconsistent (?0,?0) ?11(?1,?1) ?11(?2,?2) ?11 ?11(??,??) ?11(??+1,??+1) Can take a C11-step to a non-SC execution graph ,?1) ??(?2 ?1 ,?2) ?? ?? (??,??) ?2 (?0,?0) ?? (?1 ?0 Robustness information ??

  41. x = 1 y = 1 y = 1 b = x a = y b = x a = y b = x b = x y = 1 a = 0 b = 0 a = 1 b = 0 a = 0 b = 0 a = 0 b = 0 a = 1 b = 0 W x 0 W y 0 W x 0 W y 0 W x 0 W y 0 W x 0 W y 0 W x 0 W y 0 W x 1 W x 1 W x 1 R y 1 W x 1 R y 1 W y 1 R x 0 W y 1 W y 1 Wx1 T1 Wy1 T1 Ry1 T2 Rx0 T2 (?0,?0) (?3,?3) (?1,?1) (?4,?4) (?2,?2) ?11 ?11 ?11 ?11 ,?1) ?1 ,?2) ?2 robustness violation! (?0,?0) ?0 (?3,?3) ?3 (?1 (?2 ?? ?? ?? In C11, T2 may not read from Wx1, but in SC it must.

  42. Rocker A static automated tool for checking robustness of simple programs. Robustness is verified under SC

  43. Results (Release/Acquire) Program Robustness Time (seconds) Peterson (SC) 1.0 Peterson (Bratosz) 1.1 Peterson (Dmitriy) 1.2 Store-Buffer 0.9 Message-Passing 0.9

  44. Results (Relaxed) Program Robustness Time (seconds) Peterson (Dmitriy) 1.2 Peterson (Fix1) 1.3 Peterson (Fix2) 1.3 Store-Buffer 0.9 Message-Passing 0.9

  45. Robustness against C11 is decidable and PSPACE-complete. robust verification problem assuming SC input program with C11-like atomics and non-atomics SPIN model checker Rocker not robust You can try the tool without having to understand the memory model! Rocker https://github.com/rymrg/rocker/ Thank you!

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#