Concurrent Revisions: A Model for Deterministic Concurrency

Concurrent Revisions:

A deterministic concurrency model.

Daan Leijen, Alexandro Baldassin,

and  Sebastian Burckhardt

Microsoft Research

(OOPSLA 2010)

The concurrency elephant

•

Task/Data Parallel

: TPL, X10, Cilk, StreamIt,

Cuda, OpenMP, etc.

•

Concurrent

: Thread, Locks, Promises,

Transactions, etc.

•

Our focus

: Concurrent interactive applications

with large shared data structures.

Application = Shared Data and Tasks

Shared Data

Reader

Mutator

Reader

Example: Office application

•

Save the document

•

React to keyboard input by the user

•

Perform a spellcheck in the background

•

Exchange updates with remote users

Mutator

Reader

Spacewars!

About 15k lines of C# code, using DirectX. The original game is sequential

Example 1:

read-write conflict



Render task

reads position of all game objects



Physics task

updates position of all game objects

=> Render task needs to see consistent snapshot

Example 2:

write-write conflict



Physics task

updates position of all game objects



Network task

updates position of some objects

=> Network has priority over physics updates

Examples from SpaceWars Game

Conflicting tasks can not efficiently execute in

parallel.



pessimistic concurrency control (i.e.

locks

•

use locks to avoid parallelism where

there are (real or potential) conflicts



optimistic concurrency control (i.e.

TM

•

speculate on absence of conflicts

rollback if there are real conflicts

  either way: true conflicts kill parallelism

Conventional Concurrency Control

Our Proposed Programming Model:

Revisions and Isolation Types

•

Deterministic Conflict Resolution

, never roll-back

•

No restrictions on tasks (can be long-running, do I/O)

•

Full concurrent reading and writing of shared data

•

Clean semantics (see technical report)

•

Fast and space-efficient runtime implementation

Revision

A logical unit of

work that is forked

and joined

Isolation Type

A type which implements

automatic  copying/merging of

versions on write-write conflict

What’s new

•

Isolation

: side effects

are only visible when the

revision is joined.

•

Deterministic

 execution!

int

 x = 0;

Task

t =

fork

   x = 1;

assert

(x==0 || x==1);

join

t;

assert

(x==1);

Versioned

int

> x = 0;

Revision

r =

rfork

   x = 1;

assert

(x==0);

join

r;

assert

(x==1);

Traditional Task

Concurrent Revisions

Isolation types:

declares shared data

fork revision:

forks off a private copy

of the shared state

join revision:

waits for the revision to

terminate and writes back

changes into the main revision

isolation:

Concurrent modifications

are not seen by others

No isolation:

We see either 0 or 1

depending on the

schedule

Puzzle time…

int

 x = 0;

int

 y = 0;

Task

t =

fork

if

 (x==0) y++;

if

 (y==0) x++;

join

t;

Hard to read:  let’s use a diagram instead…

Sequential consistency

int

 x = 0

int

 y = 0

if

 (y==0) x++;

if

(x==0) y++;

assert

(   (x==0 && y==1)

       || (x==1 && y==0)

       || (x==1 && y==1));

What are the possible values for x  and y?

??

Transactional memory

int

 x = 0

int

 y = 0

atomic

if

 (y==0) x++;

atomic

if

 (x==0) y++;

assert

(   (x==0 && y==1)

       || (x==1 && y==0));

??

Concurrent revisions

Versioned

int

> x = 0

Versioned

int

> y = 0

if

 (y==0) x++;

if

(x==0) y++;

assert

(x==1 && y==1);

Determinism

only 1 possible

result

Isolation

 is always 0

??

Isolation

and

 is always 0

Conflict resolution

By default, on a write-write conflict

(only), the modification in the

child revision wins.

Versioned

int

> x;

Custom conflict resolution

x = 0

x += 1

x += 2

assert

(x==3)

merge(1,2,0)



Cumulative

int



(main,join,orig).main + join – orig> x;

Demo

class

Sample

Versioned

int

 i = 0;

public

void

 Run()

var

r =

CurrentRevision

.Fork(() => {

      i += 1;

});

 i += 2;

CurrentRevision

.Join(r);

    Console

.WriteLine(

"i = "

 + i);

Demo: Sandbox

class

Sandbox

Versioned

int

 i = 0;

public void

 Run()

var

r =

CurrentRevision

.Branch

("FlakyCode"

);

try

      r.Run(() =>

        i = 1;

throw new

Exception

("Oops");

});

CurrentRevision

.Merge(r);

catch

CurrentRevision

.Abandon(r);

Console

.WriteLine(

"\n i = "

+ i);

Fork a revision without forking

an associated task/thread

Run code in a certain revision

Merge changes in a

revision into the main one

Abandon a revision and don’t

merge its changes.

A Software engineering perspective

•

Transactional memory:



Code

 centric: put “atomic” in the code



Granularity:

•

too broad: too many conflicts and no parallel

speedup

•

too small: potential races and incorrect code

•

Concurrent revisions:



Data

 centric: put annotations on the data



Granularity: group data that have mutual

constraints together, i.e. if (x + y > 0) should hold,

then x and y should be versioned together.

•

For each versioned object, maintain multiple

copies



Map revision ids to versions



`mostly’

lock-free

array

•

New copies are allocated lazily



Don’t copy on fork…

copy on first write

 after fork

•

Old copies are released on join



No space leak

Current Implementation: C# library

Full algorithm in the paper…

SpaceWars Game

Sequential Game Loop:

Revision Diagram for Parallelized Game Loop

Coll. Det. 1

Coll. Det. 2

Coll. Det. 3

Coll. Det. 4

Render

Physics

network

autosave

(long running)

“Problem Example 1”

 is solved



Render task

reads

position of all game

objects



Physics task

updates

position of all game

objects



No interference!

Coll. Det. 1

Coll. Det. 2

Coll. Det. 3

Coll. Det. 4

Render

Physics

network

autosave

(long running)

“Problem Example 2”

is solved.



Physics task

updates

position of all game

objects



Network task

 updates

position of some

objects



Network updates have

priority over physics

updates



Order of joins

establishes

precedence!

Coll. Det. 1

Coll. Det. 2

Coll. Det. 3

Coll. Det. 4

Render

Physics

network

autosave

(long running)



Autosave now perfectly unnoticeable in background



Overall Speed-Up:

3.03x on four-core

(almost completely limited by graphics card)

Results

Physics task

Render

Collision detection

Overhead:

How much does all the copying and the indirection cost?

Only a 5% slowdown in

the sequential case

Some individual tasks

slow down much more

(i.e. physics simulation)

Revisions and Isolation Types simplify the

parallelization of applications with tasks that



Exhibit conflicting accesses to shared data



Have unpredictable latency



Have unpredictable data access pattern



May perform I/O that can not be rolled back

Revisions and Isolation Types are



easy to reason about (determinism, isolation)



have low-enough overhead for many applications

Conclusion

Questions?

•

daan@microsoft.com

•

sburckha@microsoft.com

•

External download available soon

int

 x = 0;

int

 y = 0;

task

t =

fork

if

 (x==0) y++;

if

 (y==0) x++;

join

t;

int

 x = 0;

int

 y = 0;

task

t =

fork

atomic

if

 (x==0) y++; }

atomic

if

 (y==0) x++; }

join

t;

versioned

int

> x = 0;

versioned

int

> y = 0;

revision

r =

rfork

if

 (x==0) y++;

if

 (y==0) x++;

join

r;

Sequential

Consistency

Transactional

Memory

Concurrent

Revisions

assert

(   (x==0 && y==1)

       || (x==1 && y==0)

       || (x==1 && y==1));

assert

(   (x==0 && y==1)

       || (x==1 && y==0));

assert

(x==1 && y==1);

x = 0

x += 1

x += 2

merge(1,2,0)



x += 3

assert

( x==6 )

merge(3,5,2)



By construction, there is no

‘global’ state: just local state for

each revision

State is simply a (partial)

function from a location to a

value

Operational Semantics

For some revision

, with snapshot



and local modifications



and an

expression context with hole



x.e

the state is a composition

of the root snapshot



and

local modifications



On a

fork

, the

snapshot of the new

revision

’

is the

current state:



::



On a

join

, the writes of the

joinee

’

take priority over

the writes of  the current

revision:



::

’

Custom merge: per location (type)

On a

join

, using a

merge function.

No conflict

 if a

location was not

written in the joinee

No conflict

 if a location

was unmodified in the

current revision, use

the value of the joinee

Conflict

 otherwise, use

a location/type specific

merge function

Standard merges:

What is a conflict?

•

Merge is only called

if

(1) write in child, and

(2) modification in

main revision:

Cumulative

int

> x = 0

x += 2

x += 3

assert( x = 5 )

No conflict

(merge function

is not called)

No conflict

(merge function

is not called)

Merging with failure

On fail, we just ignore any

writes in the joinee

Snapshot isolation

•

Widely used in databases, for example Oracle

and Microsoft SQL

•

In essence, in snapshot isolation a concurrent

transaction can only complete in the absence

of write-write conflicts.

•

Our calculus generalizes snapshot isolation:



We support arbitrary nesting



We allow custom merge functions to resolve

write-write conflicts deterministically

Snapshot isolation

We can succinctly model snapshot isolation as:

•

Disallow nesting

•

Use the default merge:

Some versions of snapshot isolation do not treat

silent writes in a transaction as a conflict:

Sequential merges

•

We can view each location as an abstract data

types (i.e. object) with certain operations (i.e.

methods).

•

If a merge function always behaves as if

concurrent operations for those objects are

sequential, we call it a

sequential merge.

•

Such objects always behave as if the

operations in the joinee are all done

sequentially at the join point.

Sequential merges

•

A merge is sequential if:

merge(

uw

),

uw

),

uw

•

And

uw

 

merge(

uw

),

uw

),

x = o

Abelian merges

•

For any abstract data type that forms an

abelian group (associative, commutative, with

inverses) with neutral element

 and an

operation

,  the following merge is

sequential:

    merge

’

) =



’



•

This holds for example for additive integers

and additive sets.

Slide Note

Embed Share

Download

This content discusses a deterministic concurrency model called Concurrent Revisions, focusing on interactive applications with large shared data structures. It covers the challenges of conflicting tasks, conventional concurrency control methods, and proposes a programming model based on revisions and isolation types for efficient parallel execution.

hus_br Follow

Uploaded on Oct 05, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Concurrent Revisions: A deterministic concurrency model. Daan Leijen, Alexandro Baldassin, and Sebastian Burckhardt Microsoft Research (OOPSLA 2010)

The concurrency elephant Task/Data Parallel: TPL, X10, Cilk, StreamIt, Cuda, OpenMP, etc. Concurrent: Thread, Locks, Promises, Transactions, etc. Our focus: Concurrent interactive applications with large shared data structures.

Application = Shared Data and Tasks Example: Office application Save the document React to keyboard input by the user Perform a spellcheck in the background Exchange updates with remote users Reader Mutator Reader Shared Data Mutator Reader

Beginning .NET Game Programming in C# Spacewars! About 15k lines of C# code, using DirectX. The original game is sequential

Examples from SpaceWars Game Example 1: read-write conflict Render task reads position of all game objects Physics task updates position of all game objects => Render task needs to see consistent snapshot Example 2: write-write conflict Physics task updates position of all game objects Network task updates position of some objects => Network has priority over physics updates

Conventional Concurrency Control Conflicting tasks can not efficiently execute in parallel. pessimistic concurrency control (i.e. locks) use locks to avoid parallelism where there are (real or potential) conflicts optimistic concurrency control (i.e. TM) speculate on absence of conflicts rollback if there are real conflicts either way: true conflicts kill parallelism.

Our Proposed Programming Model: Revisions and Isolation Types Revision A logical unit of work that is forked and joined Isolation Type A type which implements automatic copying/merging of versions on write-write conflict Deterministic Conflict Resolution, never roll-back No restrictions on tasks (can be long-running, do I/O) Full concurrent reading and writing of shared data Clean semantics (see technical report) Fast and space-efficient runtime implementation

No isolation: What s new We see either 0 or 1 depending on the schedule fork revision: forks off a private copy of the shared state Isolation types: declares shared data Concurrent Revisions Traditional Task int x = 0; Task t = fork { x = 1; } assert(x==0 || x==1); join t; assert(x==1); Versioned<int> x = 0; Revision r = rfork { x = 1; } assert(x==0); join r; assert(x==1); isolation: Isolation: side effects are only visible when the revision is joined. Deterministic execution! Concurrent modifications are not seen by others join revision: waits for the revision to terminate and writes back changes into the main revision

Puzzle time int x = 0; int y = 0; Task t = fork { if (x==0) y++; } if (y==0) x++; join t; Hard to read: let s use a diagram instead

Sequential consistency int x = 0 int y = 0 if (y==0) x++; if (x==0) y++; assert( (x==0 && y==1) || (x==1 && y==0) || (x==1 && y==1)); What are the possible values for x and y? ??

Transactional memory int x = 0 int y = 0 atomic { if (y==0) x++; } atomic { if (x==0) y++; } ?? assert( (x==0 && y==1) || (x==1 && y==0));

Concurrent revisions Versioned<int> x = 0 Versioned<int> y = 0 Isolation and x is always 0 Isolation y is always 0 if (y==0) x++; if (x==0) y++; Determinism only 1 possible result assert(x==1 && y==1); ??

Conflict resolution By default, on a write-write conflict (only), the modification in the child revision wins. Versioned<int> x; x = 0 x = 0 x = 0 x = 1 x = 2 x = 1 x = 0 x = 1 assert(x==2) assert(x==0) assert(x==1)

Custom conflict resolution Cumulative<int, (main,join,orig).main + join orig> x; x = 0 0 x += 1 x += 2 merge(1,2,0) 3 1 2 assert(x==3)

Demo class Sample { [Versioned] int i = 0; public void Run() { var r = CurrentRevision.Fork(() => { i += 1; }); i += 2; CurrentRevision.Join(r); Console.WriteLine("i = " + i); } }

Demo: Sandbox Fork a revision without forking an associated task/thread class Sandbox { [Versioned] int i = 0; public void Run() { var r = CurrentRevision.Branch("FlakyCode"); try { r.Run(() => { i = 1; throw new Exception("Oops"); }); CurrentRevision.Merge(r); } catch { CurrentRevision.Abandon(r); } Console.WriteLine("\n i = " + i); } } Run code in a certain revision Merge changes in a revision into the main one Abandon a revision and don t merge its changes.

A Software engineering perspective Transactional memory: Codecentric: put atomic in the code Granularity: too broad: too many conflicts and no parallel speedup too small: potential races and incorrect code Concurrent revisions: Data centric: put annotations on the data Granularity: group data that have mutual constraints together, i.e. if (x + y > 0) should hold, then x and y should be versioned together.

Current Implementation: C# library For each versioned object, maintain multiple copies Map revision ids to versions `mostly lock-free array Revision Value 1 0 40 2 45 7 New copies are allocated lazily Don t copy on fork copy on first write after fork Old copies are released on join No space leak

Full algorithm in the paper

SpaceWars Game Parallel Collision Detection Parallel Collision Detection Parallel Collision Detection Render Screen Graphics Card Simulate Physics Sequential Game Loop: Play Sounds Shared State Send Process Inputs Receive Autosave Key- board Network Connection Disk

Revision Diagram for Parallelized Game Loop Coll. Det. 4 Coll. Det. 1 Coll. Det. 2 Coll. Det. 3 network Render Physics (long running) autosave

Problem Example 1 is solved Render task reads position of all game objects Physics task updates position of all game objects No interference! Coll. Det. 4 Coll. Det. 1 Coll. Det. 2 Coll. Det. 3 network Render Physics (long running) autosave

Problem Example 2 is solved. Physics task updates position of all game objects Network task updates position of some objects Network updates have priority over physics updates Order of joins establishes precedence! Coll. Det. 4 Coll. Det. 1 Coll. Det. 2 Coll. Det. 3 network Render Physics (long running) autosave

Results Physics task Render Collision detection Autosave now perfectly unnoticeable in background Overall Speed-Up: 3.03x on four-core (almost completely limited by graphics card)

Only a 5% slowdown in the sequential case Overhead: How much does all the copying and the indirection cost? Some individual tasks slow down much more (i.e. physics simulation)

Conclusion Revisions and Isolation Types simplify the parallelization of applications with tasks that Exhibit conflicting accesses to shared data Have unpredictable latency Have unpredictable data access pattern May perform I/O that can not be rolled back Revisions and Isolation Types are easy to reason about (determinism, isolation) have low-enough overhead for many applications

Questions? daan@microsoft.com sburckha@microsoft.com External download available soon

Sequential Consistency int x = 0; int y = 0; task t = fork { if (x==0) y++; } if (y==0) x++; join t; assert( (x==0 && y==1) || (x==1 && y==0) || (x==1 && y==1)); Concurrent Revisions Transactional Memory int x = 0; int y = 0; task t = fork { atomic { if (x==0) y++; } } atomic { if (y==0) x++; } join t; versioned<int> x = 0; versioned<int> y = 0; revision r = rfork { if (x==0) y++; } if (y==0) x++; join r; assert( (x==0 && y==1) || (x==1 && y==0)); assert(x==1 && y==1);

x = 0 0 x += 1 x += 2 2 x += 3 merge(1,2,0) 3 1 2 3 5 merge(3,5,2) 6 assert( x==6 )

By construction, there is no global state: just local state for each revision State is simply a (partial) function from a location to a value

Operational Semantics For some revision r, with snapshot and local modifications and an expression context with hole( x.e) v the state is a composition of the root snapshot and local modifications On a join, the writes of the joinee r take priority over the writes of the current revision: :: On a fork, the snapshot of the new revision r is the current state: ::

Custom merge: per location (type) No conflict if a location was not written in the joinee On a join, using a merge function. No conflict if a location was unmodified in the current revision, use the value of the joinee Conflict otherwise, use a location/type specific merge function Standard merges:

What is a conflict? Merge is only called if: (1) write in child, and (2) modification in main revision: Cumulative<int> x = 0 0 x += 2 2 No conflict (merge function is not called) x += 3 0 2 2 5 No conflict (merge function is not called) assert( x = 5 )

Merging with failure On fail, we just ignore any writes in the joinee

Snapshot isolation Widely used in databases, for example Oracle and Microsoft SQL In essence, in snapshot isolation a concurrent transaction can only complete in the absence of write-write conflicts. Our calculus generalizes snapshot isolation: We support arbitrary nesting We allow custom merge functions to resolve write-write conflicts deterministically

Snapshot isolation We can succinctly model snapshot isolation as: Disallow nesting Use the default merge: Some versions of snapshot isolation do not treat silent writes in a transaction as a conflict:

Sequential merges We can view each location as an abstract data types (i.e. object) with certain operations (i.e. methods). If a merge function always behaves as if concurrent operations for those objects are sequential, we call it a sequential merge. Such objects always behave as if the operations in the joinee are all done sequentially at the join point.

Sequential merges x = o A merge is sequential if: u merge(uw1(o), uw2(o), u(o)) = uw1w2(o) w1 w2 Anduw1w2(o) merge(uw1(o), uw2(o), u(o))

Abelian merges For any abstract data type that forms an abelian group (associative, commutative, with inverses) with neutral element 0 and an operation , the following merge is sequential: merge(v,v ,v0) = v v v0 This holds for example for additive integers and additive sets.

Concurrent Revisions: A Model for Deterministic Concurrency

Download Presentation

Presentation Transcript

Related

More Related Content