Discussions on Programmers' Needs, Memory Models, and Consistency in Software Development

undefined
Relaxed Consistency
Finale
DAVID DEVECSERY
Administration!
Syllabus quiz (last warning!) – 
On Canvas!
Review 1 is graded.
If you submitted but don’t have a grade, let me know
If you got a comment, you should adjust future reviews
If you didn’t get a comment, you had a fine review
Sign-up for leading discussions coming soon!
Today:
Higher-level discussion
What do programmers need?
What is unacceptable?
What are the costs of providing these abstractions?
Some more consistency background and semantics
Can we do better than barriers?
We “Frighten Small Children”
Finishing up DRF-0 discussion
 
Advantages
 
Disadvantages
Do you believe the DRF-0 argument?
 
Is it really best to define memory models by what’s
needed to achieve SC?
Is there a better definition?
Should all programs really be data-race free?
There are legal programs with data-races
Do we really need all of the performance that comes from
as weak a memory model as possible?
What’s the performance gain/cost of weaker architectures?
Partial consistency: locking
Will this code enforce mutual exclusion on a PC MM?
Lock:
do {
 
tmp = xchg(unlocked, 0);
 
if (tmp != 1) {
  
while (unlocked != 1) ;
 
}
} while (tmp != 1)
 
Unlock
unlocked = 1
xchg (x, y):
atomic {
 
old = x;
 
*x = y;
 
return old
}
Hint
: Think of using the
synchronization to lock
another variable:
lock()
y = 1
unlock ()
Relaxed consistency: barriers
X86 [X]Fences: 
All X instructions before the fence will be
globally visible to all X instructions following the fence
 
MFENCE (load or store) – Handout provided
 
SFENCE (stores only) – Handout provided
 
LFENCE (loads only) – Handout provided
NOTE:
 x86 also has a LOCK prefix, it is needed and related
to fences, but different
Partial consistency: barriers
Can we use fences to make this mutually exclusive?
Lock:
do {
 
tmp = xchg(unlocked, 0);
 
if (tmp != 1) {
  
while (unlocked != 1) ;
 
}
} while (tmp != 1)
MFENCE
 
Unlock
MFENCE
unlocked = 1
xchg (x, y):
atomic {
 
old = x;
 
x = y;
 
return old
}
In what ways can we reorder operations?
Data Reordering
Read-read
Read-write
Write-write
Memory Atomicity
When a store goes to memory, is it visible to all processors?
What reorderings should be allowed
here?
Thread 1
1. local1 = 10
2. lock
()
3. shared1 = true
4. local2 = shared2;
5. unlock
()
6. local3 = local1
7. local4 = local2
 
What are the highest and
lowest points line 4 can be
reorderd to?
What are the highest and
lowest points line 3 can be
reorderd to?
What about line 6?
line 7?
Partial consistency: barriers
Can we use fences to make this mutually exclusive?
Lock:
do {
 
tmp = xchg(unlocked, 0);
 
if (tmp != 1) {
  
while (unlocked != 1) ;
 
}
} while (tmp != 1)
MFENCE
 
Unlock
MFENCE
unlocked = 1
xchg (x, y):
atomic {
 
old = x;
 
x = y;
 
return old
}
Is this an optimal lock
implementation?
What constraints do we
need (instead of fence) to
make it optimal?
Thread 1
1. local1 = 10
2. lock
()
3. shared1 = true
4. local2 = shared2;
5. unlock
()
6. local3 = local1
7. local4 = local2
 
Reordering limits
Cannot order above
Cannot order below
Acquire semantics
Release semantics
Partial consistency: barriers
Can we use fences to make this mutually exclusive?
Lock:
do {
 
tmp = xchg(unlocked, 0,
  
AcquireSemantics
);
 
if (tmp != 1) {
  
while (unlocked != 1) ;
 
}
} while (tmp != 1)
 
Unlock
Store(unlocked, 1,
   ReleaseSemantics
);
xchg (x, y):
atomic {
 
old = x;
 
x = y;
 
return old
}
Is this an optimal lock
implementation?
What constraints do we
need (instead of fence) to
make it optimal?
S
h
i
f
t
i
n
g
 
G
e
a
r
s
:
 
S
i
l
e
n
t
l
y
 
s
h
i
f
t
i
n
g
 
s
e
m
i
c
o
l
o
n
Authors argue that for “safe” languages (e.g. Java), the
statement ordering abstraction is essential
We shouldn’t try to create faster unsafe code, instead we should try to make
safe code faster.
Different than DRF-0
DRF-0 is fast-by-default, with a burdensome programmer interface
They argue for safety-by-default, with unclear performance implications
Should we prioritize safety, or
performance?
Traditionally work has prioritized performance
Is it clear that there is a significant performance cost to prioritizing safety?
“Existence proof,” we program in this environment today, its fine
Work prioritizing safety is promising
But has gained little traction
<30% overhead for SC on some server applications
Some languages (e.g. python) enforce SC via uni-core execution
Linux Memory Model:
Are you frightened, or just disconcerted?
What does a world without DRF0 look like?
What does it take to formally reason about all of these weak memory
models?
This paper explores the significant effort to reason about
the linux memory model
Case study on the RCU data-structure
What primitives exist?
Reason about several primitives
Translate into read ( R ) write ( W ), and fence ( F ) operations
Specify operations, and data-flow rules in 
cat
 language
Formal language, allows operations to be
Core model
Rules for:
Sequential consistency per variable
Atomicity of read-modify-write
Happens-before
Propagates-before
(Constrains propagation of writes across fences)
This seems simple enough, but once they explain
implementation, it gets complex quickly!
Many more details
Did you get lost at some points?
I got lost at some points.
Formally modifying behaviors on general relaxed
consistency processors in something as complex as the
Linux kernel is a very complicated matter!
Even for these small kernel algorithms evaluated by this paper!
Enter Conference slides
 
Found at:
http://www.rdrop.com/users/paulmck/scalability/pa
per/slides-asplos18.pdf
This is so complex, is it useful?
Even though this tool is so complicated, Linux developers
find it useful!
Paper cites mailing list entry in which dev references verified model of fix.
Is it really reasonable to expect programmers to reason at
this level?
Next class
No class Monday, Sep. 3!
Entering into the realm of compilers on Sep. 5
Enjoy Labor Day
Slide Note
Embed
Share

Today's discussions covered various topics including what programmers require, the debate on defining memory models for achieving Sequential Consistency (SC), considerations for data-race-free programs, and the performance trade-offs of weaker memory architectures. Insights into partial and relaxed consistency, mutual exclusion enforcement, and the role of barriers such as fences in maintaining consistency were explored in depth.

  • Programmers Needs
  • Memory Models
  • Consistency
  • Software Development

Uploaded on Sep 09, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Relaxed Consistency Finale DAVID DEVECSERY

  2. Administration! Syllabus quiz (last warning!) On Canvas! Review 1 is graded. If you submitted but don t have a grade, let me know If you got a comment, you should adjust future reviews If you didn t get a comment, you had a fine review Sign-up for leading discussions coming soon!

  3. Today: Higher-level discussion What do programmers need? What is unacceptable? What are the costs of providing these abstractions? Some more consistency background and semantics Can we do better than barriers? We Frighten Small Children

  4. Finishing up DRF-0 discussion Advantages Disadvantages

  5. Do you believe the DRF-0 argument? Is it really best to define memory models by what s needed to achieve SC? Is there a better definition? Should all programs really be data-race free? There are legal programs with data-races Do we really need all of the performance that comes from as weak a memory model as possible? What s the performance gain/cost of weaker architectures?

  6. Partial consistency: locking Will this code enforce mutual exclusion on a PC MM? xchg (x, y): atomic { old = x; *x = y; return old } Hint: Think of using the synchronization to lock another variable: Lock: Unlock unlocked = 1 do { } while (tmp != 1) lock() y = 1 unlock () tmp = xchg(unlocked, 0); if (tmp != 1) { while (unlocked != 1) ; }

  7. Relaxed consistency: barriers X86 [X]Fences: All X instructions before the fence will be globally visible to all X instructions following the fence MFENCE (load or store) Handout provided SFENCE (stores only) Handout provided LFENCE (loads only) Handout provided NOTE: x86 also has a LOCK prefix, it is needed and related to fences, but different

  8. Partial consistency: barriers Can we use fences to make this mutually exclusive? xchg (x, y): atomic { } Lock: Unlock MFENCE unlocked = 1 do { } while (tmp != 1) MFENCE tmp = xchg(unlocked, 0); if (tmp != 1) { while (unlocked != 1) ; } old = x; x = y; return old

  9. In what ways can we reorder operations? Data Reordering Read-read Read-write Write-write Memory Atomicity When a store goes to memory, is it visible to all processors?

  10. What reorderings should be allowed here? What are the highest and lowest points line 4 can be reorderd to? Thread 1 1. local1 = 10 2. lock() What are the highest and lowest points line 3 can be reorderd to? 3. shared1 = true 4. local2 = shared2; 5. unlock() 6. local3 = local1 7. local4 = local2 What about line 6? line 7?

  11. Partial consistency: barriers Can we use fences to make this mutually exclusive? Is this an optimal lock implementation? xchg (x, y): atomic { } Lock: Unlock MFENCE unlocked = 1 do { } while (tmp != 1) MFENCE tmp = xchg(unlocked, 0); if (tmp != 1) { while (unlocked != 1) ; } old = x; x = y; return old make it optimal? What constraints do we need (instead of fence) to

  12. Reordering limits Thread 1 1. local1 = 10 2. lock() Acquire semantics Cannot order above 3. shared1 = true 4. local2 = shared2; Release semantics 5. unlock() 6. local3 = local1 7. local4 = local2 Cannot order below

  13. Partial consistency: barriers Can we use fences to make this mutually exclusive? Is this an optimal lock implementation? xchg (x, y): atomic { } Lock: Unlock Store(unlocked, 1, ReleaseSemantics); do { } while (tmp != 1) tmp = xchg(unlocked, 0, AcquireSemantics); if (tmp != 1) { while (unlocked != 1) ; } old = x; x = y; return old make it optimal? What constraints do we need (instead of fence) to

  14. Shifting Gears: Shifting Gears: Silently shifting semicolon Authors argue that for safe languages (e.g. Java), the statement ordering abstraction is essential We shouldn t try to create faster unsafe code, instead we should try to make safe code faster. Different than DRF-0 DRF-0 is fast-by-default, with a burdensome programmer interface They argue for safety-by-default, with unclear performance implications

  15. Should we prioritize safety, or performance? Traditionally work has prioritized performance Is it clear that there is a significant performance cost to prioritizing safety? Existence proof, we program in this environment today, its fine Work prioritizing safety is promising But has gained little traction <30% overhead for SC on some server applications Some languages (e.g. python) enforce SC via uni-core execution

  16. Linux Memory Model: Are you frightened, or just disconcerted? What does a world without DRF0 look like? What does it take to formally reason about all of these weak memory models? This paper explores the significant effort to reason about the linux memory model Case study on the RCU data-structure

  17. What primitives exist? Reason about several primitives Translate into read ( R ) write ( W ), and fence ( F ) operations Specify operations, and data-flow rules in cat language Formal language, allows operations to be

  18. Core model Rules for: Sequential consistency per variable Atomicity of read-modify-write Happens-before Propagates-before (Constrains propagation of writes across fences) This seems simple enough, but once they explain implementation, it gets complex quickly!

  19. Many more details Did you get lost at some points? I got lost at some points. Formally modifying behaviors on general relaxed consistency processors in something as complex as the Linux kernel is a very complicated matter! Even for these small kernel algorithms evaluated by this paper!

  20. Enter Conference slides Found at: http://www.rdrop.com/users/paulmck/scalability/pa per/slides-asplos18.pdf

  21. This is so complex, is it useful? Even though this tool is so complicated, Linux developers find it useful! Paper cites mailing list entry in which dev references verified model of fix. Is it really reasonable to expect programmers to reason at this level?

  22. Next class No class Monday, Sep. 3! Entering into the realm of compilers on Sep. 5 Enjoy Labor Day

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#