Concurrent Revisions: A Model for Deterministic Concurrency

 
Concurrent Revisions:
A deterministic concurrency model.
 
Daan Leijen, Alexandro Baldassin,
and  Sebastian Burckhardt
Microsoft Research
(OOPSLA 2010)
The concurrency elephant
 
Task/Data Parallel
: TPL, X10, Cilk, StreamIt,
Cuda, OpenMP, etc.
Concurrent
: Thread, Locks, Promises,
Transactions, etc.
Our focus
: Concurrent interactive applications
with large shared data structures.
 
Application = Shared Data and Tasks
Shared Data
Reader
Mutator
Reader
 
Example: Office application
Save the document
React to keyboard input by the user
Perform a spellcheck in the background
Exchange updates with remote users
Mutator
Reader
 
Spacewars!
 
About 15k lines of C# code, using DirectX. The original game is sequential
 
Example 1:
 
read-write conflict
Render task
 
reads position of all game objects
Physics task 
updates position of all game objects
 
=> Render task needs to see consistent snapshot
 
Example 2: 
write-write conflict
Physics task 
updates position of all game objects
Network task 
updates position of some objects
 
=> Network has priority over physics updates
 
 
Examples from SpaceWars Game
 
Conflicting tasks can not efficiently execute in
parallel.
pessimistic concurrency control (i.e. 
locks
)
use locks to avoid parallelism where
there are (real or potential) conflicts
optimistic concurrency control (i.e. 
TM
)
speculate on absence of conflicts
rollback if there are real conflicts
  either way: true conflicts kill parallelism
.
 
Conventional Concurrency Control
Our Proposed Programming Model:
Revisions and Isolation Types
 
Deterministic Conflict Resolution
, never roll-back
No restrictions on tasks (can be long-running, do I/O)
Full concurrent reading and writing of shared data
Clean semantics (see technical report)
Fast and space-efficient runtime implementation
 
 
 
Revision
A logical unit of
work that is forked
and joined
Isolation Type
A type which implements
automatic  copying/merging of
versions on write-write conflict
What’s new
Isolation
: side effects
are only visible when the
revision is joined.
Deterministic
 execution!
int
 x = 0;
Task
 t = 
fork
 {
   x = 1;
}
assert
(x==0 || x==1);
join
 t;
assert
(x==1);
Versioned
<
int
> x = 0;
Revision
 r = 
rfork
 {
   x = 1;
}
assert
(x==0);
join
 r;
assert
(x==1);
Traditional Task
Concurrent Revisions
Isolation types:
declares shared data
fork revision:
forks off a private copy
of the shared state
join revision:
waits for the revision to
terminate and writes back
changes into the main revision
isolation:
Concurrent modifications
are not seen by others
No isolation:
We see either 0 or 1
depending on the
schedule
 
Puzzle time…
int
 x = 0;
int
 y = 0;
 
Task
 t = 
fork
 {
   
if
 (x==0) y++;
}
if
 (y==0) x++;
 
join
 t;
 
Hard to read:  let’s use a diagram instead…
Sequential consistency
int
 x = 0
int
 y = 0
if
 (y==0) x++;
if 
(x==0) y++;
assert
(   (x==0 && y==1)
       || (x==1 && y==0)
       || (x==1 && y==1));
 
What are the possible values for x  and y?
 
??
Transactional memory
int
 x = 0
int
 y = 0
atomic 
{
  if
 (y==0) x++;
}
atomic 
{
  if
 (x==0) y++;
}
assert
(   (x==0 && y==1)
       || (x==1 && y==0));
 
??
Concurrent revisions
Versioned
<
int
> x = 0
Versioned
<
int
> y = 0
if
 (y==0) x++;
if 
(x==0) y++;
assert
(x==1 && y==1);
Determinism
only 1 possible
result
Isolation
y
 is always 0
 
??
Isolation
and 
x
 is always 0
Conflict resolution
By default, on a write-write conflict
(only), the modification in the
child revision wins.
Versioned
<
int
> x;
Custom conflict resolution
x = 0
x += 1
x += 2
assert
(x==3)
merge(1,2,0)
 3
Cumulative
<
int
,
(main,join,orig).main + join – orig> x;
 
Demo
class
 
Sample
{
  [
Versioned
]
  
int
 i = 0;
 
  
public
 
void
 Run()
  {
   
 var 
r = 
CurrentRevision
.Fork(() => {
      i += 1;
    });
 
 i += 2;
 
    
CurrentRevision
.Join(r);
    Console
.WriteLine(
"i = "
 + i);
  }
}
Demo: Sandbox
class
 
Sandbox
{
  [
Versioned
]
  
int
 i = 0;
  
public void
 Run()
  {
    
var
 r = 
CurrentRevision
.Branch
("FlakyCode"
);
    
try
 {
      r.Run(() =>
      {
        i = 1;
        
throw new
 
Exception
("Oops");
      });
      
CurrentRevision
.Merge(r);
    }
    
catch
 {
      
CurrentRevision
.Abandon(r);
    }
    
Console
.WriteLine(
"\n i = "
 
+ i);
  }
}
Fork a revision without forking
an associated task/thread
Run code in a certain revision
Merge changes in a
revision into the main one
Abandon a revision and don’t
merge its changes.
 
A Software engineering perspective
 
Transactional memory:
Code
 centric: put “atomic” in the code
Granularity:
too broad: too many conflicts and no parallel
speedup
too small: potential races and incorrect code
 
Concurrent revisions:
Data
 centric: put annotations on the data
Granularity: group data that have mutual
constraints together, i.e. if (x + y > 0) should hold,
then x and y should be versioned together.
 
For each versioned object, maintain multiple
copies
Map revision ids to versions
`mostly’ 
lock-free 
array
 
New copies are allocated lazily
Don’t copy on fork… 
copy on first write
 after fork
Old copies are released on join
No space leak
 
Current Implementation: C# library
 
Full algorithm in the paper…
 
SpaceWars Game
 
Sequential Game Loop:
 
Revision Diagram for Parallelized Game Loop
 
Coll. Det. 1
 
Coll. Det. 2
 
Coll. Det. 3
 
Coll. Det. 4
 
Render
 
Physics
 
network
 
autosave
(long running)
 
“Problem Example 1”
 is solved
 
Render task 
reads
position of all game
objects
Physics task 
updates
position of all game
objects
No interference!
 
Coll. Det. 1
 
Coll. Det. 2
 
Coll. Det. 3
 
Coll. Det. 4
 
Render
 
Physics
 
network
 
autosave
(long running)
 
“Problem Example 2” 
is solved.
 
Physics task
 
updates
position of all game
objects
Network task
 updates
position of some
objects
Network updates have
priority over physics
updates
Order of joins
establishes
precedence!
 
Coll. Det. 1
 
Coll. Det. 2
 
Coll. Det. 3
 
Coll. Det. 4
 
Render
 
Physics
 
network
 
autosave
(long running)
Autosave now perfectly unnoticeable in background
Overall Speed-Up: 
3.03x on four-core 
(almost completely limited by graphics card)
Results
Physics task
Render
Collision detection
Overhead:
How much does all the copying and the indirection cost?
Only a 5% slowdown in
the sequential case
Some individual tasks
slow down much more
(i.e. physics simulation)
 
Revisions and Isolation Types simplify the
parallelization of applications with tasks that
Exhibit conflicting accesses to shared data
Have unpredictable latency
Have unpredictable data access pattern
May perform I/O that can not be rolled back
 
Revisions and Isolation Types are
easy to reason about (determinism, isolation)
have low-enough overhead for many applications
 
Conclusion
 
Questions?
 
 
daan@microsoft.com
sburckha@microsoft.com
 
External download available soon
int
 x = 0;
int
 y = 0;
task
 t = 
fork
 {
   
if
 (x==0) y++;
}
if
 (y==0) x++;
join
 t;
int
 x = 0;
int
 y = 0;
task
 t = 
fork
 {
  
atomic
 { 
if
 (x==0) y++; }
}
atomic
 { 
if
 (y==0) x++; }
join
 t;
versioned
<
int
> x = 0;
versioned
<
int
> y = 0;
revision
 r = 
rfork
 {
   
if
 (x==0) y++;
}
if
 (y==0) x++;
join
 r;
Sequential
Consistency
 
Transactional
Memory
 
Concurrent
Revisions
assert
(   (x==0 && y==1)
       || (x==1 && y==0)
       || (x==1 && y==1));
assert
(   (x==0 && y==1)
       || (x==1 && y==0));
assert
(x==1 && y==1);
x = 0
x += 1
x += 2
merge(1,2,0)
 3
x += 3
assert
( x==6 )
merge(3,5,2)
 6
3
5
2
By construction, there is no
‘global’ state: just local state for
each revision
State is simply a (partial)
function from a location to a
value
Operational Semantics
For some revision 
r
, with snapshot 
and local modifications 
 
and an
expression context with hole
 
(
x.e
)
 v
the state is a composition
of the root snapshot 
 and
local modifications 
On a 
fork
, the
snapshot of the new
revision 
r
is the
current state:  
::
On a 
join
, the writes of the
joinee 
r
take priority over
the writes of  the current
revision: 
::
’
Custom merge: per location (type)
On a 
join
, using a
merge function.
No conflict
 if a
location was not
written in the joinee
No conflict
 if a location
was unmodified in the
current revision, use
the value of the joinee
Conflict
 otherwise, use
a location/type specific
merge function
 
Standard merges:
What is a conflict?
Merge is only called 
if
:
(1) write in child, and
(2) modification in
main revision:
Cumulative
<
int
> x = 0
x += 2
x += 3
assert( x = 5 )
0
2
5
2
No conflict
(merge function
is not called)
No conflict
(merge function
is not called)
0
2
Merging with failure
On fail, we just ignore any
writes in the joinee
 
Snapshot isolation
 
Widely used in databases, for example Oracle
and Microsoft SQL
In essence, in snapshot isolation a concurrent
transaction can only complete in the absence
of write-write conflicts.
Our calculus generalizes snapshot isolation:
We support arbitrary nesting
We allow custom merge functions to resolve
write-write conflicts deterministically
Snapshot isolation
 
We can succinctly model snapshot isolation as:
Disallow nesting
Use the default merge:
 
Some versions of snapshot isolation do not treat
silent writes in a transaction as a conflict:
 
Sequential merges
 
We can view each location as an abstract data
types (i.e. object) with certain operations (i.e.
methods).
If a merge function always behaves as if
concurrent operations for those objects are
sequential, we call it a 
sequential merge.
Such objects always behave as if the
operations in the joinee are all done
sequentially at the join point.
 
Sequential merges
 
A merge is sequential if:
 
 
merge(
uw
1
(
o
),
 uw
2
(
o
),
 u
(
o
)
)
=
uw
1
w
2
(
o
)
 
And
  
uw
1
w
2
(
o
) 
 
u
w
1
w
2
merge(
uw
1
(
o
),
 uw
2
(
o
),
 u
(
o
)
)
x = o
 
Abelian merges
 
For any abstract data type that forms an
abelian group (associative, commutative, with
inverses) with neutral element 
0
 and an
operation 
,  the following merge is
sequential:
 
    merge
(
v
,
v
,
v
0
) = 
v
 
 v
v
0
 
This holds for example for additive integers
and additive sets.
Slide Note
Embed
Share

This content discusses a deterministic concurrency model called Concurrent Revisions, focusing on interactive applications with large shared data structures. It covers the challenges of conflicting tasks, conventional concurrency control methods, and proposes a programming model based on revisions and isolation types for efficient parallel execution.

  • Concurrency
  • Deterministic
  • Revision Model
  • Programming
  • Interactive Applications

Uploaded on Oct 05, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Concurrent Revisions: A deterministic concurrency model. Daan Leijen, Alexandro Baldassin, and Sebastian Burckhardt Microsoft Research (OOPSLA 2010)

  2. The concurrency elephant Task/Data Parallel: TPL, X10, Cilk, StreamIt, Cuda, OpenMP, etc. Concurrent: Thread, Locks, Promises, Transactions, etc. Our focus: Concurrent interactive applications with large shared data structures.

  3. Application = Shared Data and Tasks Example: Office application Save the document React to keyboard input by the user Perform a spellcheck in the background Exchange updates with remote users Reader Mutator Reader Shared Data Mutator Reader

  4. Beginning .NET Game Programming in C# Spacewars! About 15k lines of C# code, using DirectX. The original game is sequential

  5. Examples from SpaceWars Game Example 1: read-write conflict Render task reads position of all game objects Physics task updates position of all game objects => Render task needs to see consistent snapshot Example 2: write-write conflict Physics task updates position of all game objects Network task updates position of some objects => Network has priority over physics updates

  6. Conventional Concurrency Control Conflicting tasks can not efficiently execute in parallel. pessimistic concurrency control (i.e. locks) use locks to avoid parallelism where there are (real or potential) conflicts optimistic concurrency control (i.e. TM) speculate on absence of conflicts rollback if there are real conflicts either way: true conflicts kill parallelism.

  7. Our Proposed Programming Model: Revisions and Isolation Types Revision A logical unit of work that is forked and joined Isolation Type A type which implements automatic copying/merging of versions on write-write conflict Deterministic Conflict Resolution, never roll-back No restrictions on tasks (can be long-running, do I/O) Full concurrent reading and writing of shared data Clean semantics (see technical report) Fast and space-efficient runtime implementation

  8. No isolation: What s new We see either 0 or 1 depending on the schedule fork revision: forks off a private copy of the shared state Isolation types: declares shared data Concurrent Revisions Traditional Task int x = 0; Task t = fork { x = 1; } assert(x==0 || x==1); join t; assert(x==1); Versioned<int> x = 0; Revision r = rfork { x = 1; } assert(x==0); join r; assert(x==1); isolation: Isolation: side effects are only visible when the revision is joined. Deterministic execution! Concurrent modifications are not seen by others join revision: waits for the revision to terminate and writes back changes into the main revision

  9. Puzzle time int x = 0; int y = 0; Task t = fork { if (x==0) y++; } if (y==0) x++; join t; Hard to read: let s use a diagram instead

  10. Sequential consistency int x = 0 int y = 0 if (y==0) x++; if (x==0) y++; assert( (x==0 && y==1) || (x==1 && y==0) || (x==1 && y==1)); What are the possible values for x and y? ??

  11. Transactional memory int x = 0 int y = 0 atomic { if (y==0) x++; } atomic { if (x==0) y++; } ?? assert( (x==0 && y==1) || (x==1 && y==0));

  12. Concurrent revisions Versioned<int> x = 0 Versioned<int> y = 0 Isolation and x is always 0 Isolation y is always 0 if (y==0) x++; if (x==0) y++; Determinism only 1 possible result assert(x==1 && y==1); ??

  13. Conflict resolution By default, on a write-write conflict (only), the modification in the child revision wins. Versioned<int> x; x = 0 x = 0 x = 0 x = 1 x = 2 x = 1 x = 0 x = 1 assert(x==2) assert(x==0) assert(x==1)

  14. Custom conflict resolution Cumulative<int, (main,join,orig).main + join orig> x; x = 0 0 x += 1 x += 2 merge(1,2,0) 3 1 2 assert(x==3)

  15. Demo class Sample { [Versioned] int i = 0; public void Run() { var r = CurrentRevision.Fork(() => { i += 1; }); i += 2; CurrentRevision.Join(r); Console.WriteLine("i = " + i); } }

  16. Demo: Sandbox Fork a revision without forking an associated task/thread class Sandbox { [Versioned] int i = 0; public void Run() { var r = CurrentRevision.Branch("FlakyCode"); try { r.Run(() => { i = 1; throw new Exception("Oops"); }); CurrentRevision.Merge(r); } catch { CurrentRevision.Abandon(r); } Console.WriteLine("\n i = " + i); } } Run code in a certain revision Merge changes in a revision into the main one Abandon a revision and don t merge its changes.

  17. A Software engineering perspective Transactional memory: Codecentric: put atomic in the code Granularity: too broad: too many conflicts and no parallel speedup too small: potential races and incorrect code Concurrent revisions: Data centric: put annotations on the data Granularity: group data that have mutual constraints together, i.e. if (x + y > 0) should hold, then x and y should be versioned together.

  18. Current Implementation: C# library For each versioned object, maintain multiple copies Map revision ids to versions `mostly lock-free array Revision Value 1 0 40 2 45 7 New copies are allocated lazily Don t copy on fork copy on first write after fork Old copies are released on join No space leak

  19. Full algorithm in the paper

  20. SpaceWars Game Parallel Collision Detection Parallel Collision Detection Parallel Collision Detection Render Screen Graphics Card Simulate Physics Sequential Game Loop: Play Sounds Shared State Send Process Inputs Receive Autosave Key- board Network Connection Disk

  21. Revision Diagram for Parallelized Game Loop Coll. Det. 4 Coll. Det. 1 Coll. Det. 2 Coll. Det. 3 network Render Physics (long running) autosave

  22. Problem Example 1 is solved Render task reads position of all game objects Physics task updates position of all game objects No interference! Coll. Det. 4 Coll. Det. 1 Coll. Det. 2 Coll. Det. 3 network Render Physics (long running) autosave

  23. Problem Example 2 is solved. Physics task updates position of all game objects Network task updates position of some objects Network updates have priority over physics updates Order of joins establishes precedence! Coll. Det. 4 Coll. Det. 1 Coll. Det. 2 Coll. Det. 3 network Render Physics (long running) autosave

  24. Results Physics task Render Collision detection Autosave now perfectly unnoticeable in background Overall Speed-Up: 3.03x on four-core (almost completely limited by graphics card)

  25. Only a 5% slowdown in the sequential case Overhead: How much does all the copying and the indirection cost? Some individual tasks slow down much more (i.e. physics simulation)

  26. Conclusion Revisions and Isolation Types simplify the parallelization of applications with tasks that Exhibit conflicting accesses to shared data Have unpredictable latency Have unpredictable data access pattern May perform I/O that can not be rolled back Revisions and Isolation Types are easy to reason about (determinism, isolation) have low-enough overhead for many applications

  27. Questions? daan@microsoft.com sburckha@microsoft.com External download available soon

  28. Sequential Consistency int x = 0; int y = 0; task t = fork { if (x==0) y++; } if (y==0) x++; join t; assert( (x==0 && y==1) || (x==1 && y==0) || (x==1 && y==1)); Concurrent Revisions Transactional Memory int x = 0; int y = 0; task t = fork { atomic { if (x==0) y++; } } atomic { if (y==0) x++; } join t; versioned<int> x = 0; versioned<int> y = 0; revision r = rfork { if (x==0) y++; } if (y==0) x++; join r; assert( (x==0 && y==1) || (x==1 && y==0)); assert(x==1 && y==1);

  29. x = 0 0 x += 1 x += 2 2 x += 3 merge(1,2,0) 3 1 2 3 5 merge(3,5,2) 6 assert( x==6 )

  30. By construction, there is no global state: just local state for each revision State is simply a (partial) function from a location to a value

  31. Operational Semantics For some revision r, with snapshot and local modifications and an expression context with hole( x.e) v the state is a composition of the root snapshot and local modifications On a join, the writes of the joinee r take priority over the writes of the current revision: :: On a fork, the snapshot of the new revision r is the current state: ::

  32. Custom merge: per location (type) No conflict if a location was not written in the joinee On a join, using a merge function. No conflict if a location was unmodified in the current revision, use the value of the joinee Conflict otherwise, use a location/type specific merge function Standard merges:

  33. What is a conflict? Merge is only called if: (1) write in child, and (2) modification in main revision: Cumulative<int> x = 0 0 x += 2 2 No conflict (merge function is not called) x += 3 0 2 2 5 No conflict (merge function is not called) assert( x = 5 )

  34. Merging with failure On fail, we just ignore any writes in the joinee

  35. Snapshot isolation Widely used in databases, for example Oracle and Microsoft SQL In essence, in snapshot isolation a concurrent transaction can only complete in the absence of write-write conflicts. Our calculus generalizes snapshot isolation: We support arbitrary nesting We allow custom merge functions to resolve write-write conflicts deterministically

  36. Snapshot isolation We can succinctly model snapshot isolation as: Disallow nesting Use the default merge: Some versions of snapshot isolation do not treat silent writes in a transaction as a conflict:

  37. Sequential merges We can view each location as an abstract data types (i.e. object) with certain operations (i.e. methods). If a merge function always behaves as if concurrent operations for those objects are sequential, we call it a sequential merge. Such objects always behave as if the operations in the joinee are all done sequentially at the join point.

  38. Sequential merges x = o A merge is sequential if: u merge(uw1(o), uw2(o), u(o)) = uw1w2(o) w1 w2 Anduw1w2(o) merge(uw1(o), uw2(o), u(o))

  39. Abelian merges For any abstract data type that forms an abelian group (associative, commutative, with inverses) with neutral element 0 and an operation , the following merge is sequential: merge(v,v ,v0) = v v v0 This holds for example for additive integers and additive sets.

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#