Locking and Synchronization in Multithreaded Environments

Locking
and
Synchronization
 
Resources to checkout
Linux Kernel Teaching
Basic examples of common APIs
https://linux-kernel-labs.github.io/master/index.html
Rule of thumb for web based tutorials/wikipedia/Hacker News
“I can see how this circuit can appear to work, but it clearly wasn't designed
by someone that really knew what they were doing. In general, if you see a
arduino in a circuit you find on the 'net someplace, especially a simple one,
assume it was posted because the author considers it a great
accomplishment. Those that know what they are doing and draw out a circuit
like this in a minute don't consider it worth writing up a web page on. That
leaves those that took two weeks to get the motor to spin without the
transistor blowing up and they're not really sure what everything does to
write these web pages.”
- https://electronics.stackexchange.com/questions/95140/purpose-of-the-
diode-and-capacitor-in-this-motor-circuit
Shared Memory Thread
Synchronization
Threads cooperate in multithreaded environments
User threads and kernel threads
Share resources and data structures
E.g. memory cache in a web server
Coordinate execution
Producer/consumer model
For correctness, cooperation must be controlled
Must assume threads interleave execution arbitrarily
Scheduler cannot know everything
Control mechanism: synchronization
Allows restriction of interleaving
Note: This is a global issue (kernel and user)
Also in distributed systems
Shared Resources
Our focus: Coordinating access to shared
resource
Basic Problem:
Two concurrent threads access a shared variable
Access methods: Read-Modify-Write
Two levels of approach:
Mechanisms for control
Low level locks
Higher level mutexes, semaphores, monitors, condition
variables
Patterns for coordinating access
Bounded buffer, producer/consumer, …
The Classic Example
Managing a bank account:
Suppose account is shared between 2 people
What happens if both people go to ATMS and deposit $$?
 
int balance;
void deposit(int amount) {
 
balance += amount;
}
int withdraw(int amount){
 
balance -= amount;
 
return balance;
}
 
Cooperating Requires Synchronization
void deposit(int amount) {
 
balance += amount;
}
Deposit implemented as sequence of ASM instructions
Load  R1, balance
Add   R1, amount
Store R1, balance
What happens if two processes call deposit at same time?
Load  R1, balance
Add   R1, amount
Store R1, balance
Load  R1, balance
Add   R1, amount
Store R1, balance
Race Condition: Result depends on operation interleaving
Process 1
Process 2
Interleaved Schedules
Execution of the two threads can be
interleaved
Assume uniprocessor w/ preemptive scheduling
Load  R1, balance
Add   R1, amount
Load  R1, balance
Add   R1, amount
Store R1, balance
Store R1, balance
Instruction Sequence
as seen by CPU
Context Switch
Context Switch
 
What’s the account balance after this sequence?
But doesn’t the x86 do this for us?
X86 allows memory operands
Load  R1, balance
Add   R1, amount
Store R1, balance
 
Load/Store Arch (micro ops)
asm (“addl %1, %0\n”,
 
  : “+m”(balance)
 
  : “r”(amount)
 
  : ); 
CISC (x86)
 
Answer: NO
X86 instructions implemented using micro-ops in HW
Instructions decomposed into RISC instructions by CPU
decoding logic
X86 is load/store under the covers
So a “single” x86 instruction is actually a sequence of
micro instructions
 
HW
decoder
The crux of the matter
Two concurrent threads access a shared resource
without any synchronization
Creates a Race condition
Output is non-deterministic and depends on timing
We need mechanisms for controlling access to
shared resources
Allows us to reason about program operation
Re-introduces determinism
Synchronization is necessary for any shared data
structure or resource
Mutual Exclusion
Mutual exclusion synchronizes access to
shared resources
Code requiring mutual exclusion for
correctness is a “critical section”
Only one thread at a time can execute in critical
section
All other threads must wait to enter
Only when a thread leaves can another can enter
Scheduler assumptions
 
Who wins?
Is it guaranteed that someone wins?
 
Answer: We cannot know
There is randomness in the system (
From where?
)
Interrupts and the scheduler
Easy solution for uniprocessors
 
Problem == Preemption
Solution == Disable Preemption
 
Sources of preemption
Voluntary preemption: 
Controlled by thread
Involuntary preemption: 
Controlled by Scheduler or Hardware
How?
 
Solution:
Disable interrupts and do not give up CPU
On Linux w/ a single processor all spinlocks are translated
to: 
asm (“cli”); 
and 
asm(“sti”);
Critical Section Requirements
Mutual exclusion
Only one process in critical section at a time
Progress (Deadlock free)
A process in a critical section cannot block
Cannot depend on other processes
Bounded (Starvation free)
Waiting processes must eventually proceed
Performance
Overhead of entering and exiting a critical section is small
(relative to work done inside it)
Busy waiting can cause serious problems (spin wait)
Fair
Don’t make some processes wait longer than others
Implementing Critical Sections
To implement critical sections, need atomic operations
Atomic Operations: No other instructions can be
interleaved
Examples of atomic operations
Memory accesses
Loads & stores
Cache coherency ensures atomicity
Code between interrupts on a single CPU/core
But interrupts can happen randomly…
Must explicitly disable interrupts
Special instructions
Test-and-Set
Compare-and-Swap
Etc…
But the x86 is not atomic!
The x86 offers a special instruction mode that forces
atomicity
Only for a single instruction!!
Lock Prefix: Forces all micro-ops of a single instruction
to execute atomically
Asserts a lock signal on the memory bus
Disallows other CPUs/cores from accessing memory region
asm (“lock <instr>” ::
: )
Load  R1, balance
Add   R1, amount
Store R1, balance
asm (“addl %1, %0\n”,
 
  : “+m”(balance)
 
  : “r”(amount)
 
  : ); 
CISC (x86)
HW 
decoder
Load/Store Arch (micro ops)
Critical Section Building Blocks
OK we have single instruction atomicity…
Now what?
Build higher level synchronization in OS
Abstraction!
Memory accesses
Test-and-set
Compare-and-swap
Disable Interrupts
Lock memory bus
Locks
Semaphores
Monitors
Condition Variables
Small set of operations (that absolutely must be correct!!)
Mechanisms for Building
Critical Sections
Locks
Very primitive with minimal semantics
Used heavily in kernel
Semaphores
Basic
Easy to understand
Hard to use
Monitors
High level
Modern variants require language support (i.e. Java:
synchronized)
Linux Kernel: Wait queues
Locks
Lock: An object in memory that provides 2 operations
Acquire(): Called before critical section entry
Release(): Called after critical section exit
Always called as a pair (Responsibility of the
programmer)
Between acquire() and release() a thread “holds” the lock
Acquire does not return until lock is held
Only one thread can hold the lock at a time
What happens when there is a bad programmer?
Two basic types of locks:
Spinlocks: acquire() busy waits (spins) until lock is acquired
Mutexes: acquire() blocks (sleeps) until lock is acquired
Using Locks
What happens at the 2
nd
 acquire()?
What would happen if this was a uniprocessor and:
Acquire(lock) => asm(“cli”);
Release(lock) => asm(“sti”);
int withdraw(int amount){
 
balance -= amount;
 
return balance;
}
int withdraw(int amount){
 
acquire(lock);
 
balance -= amount;
 
release(lock);
 
return balance;
}
acquire(lock);
balance -= amount;
acquire(lock)
release(lock);
balance -= amount;
release(lock);
Using locks badly
Why not this?
Will this work correctly?
int withdraw(int amount){
 
balance -= amount;
 
return balance;
}
int main() {
 
int balance = 0;
 
acquire(lock);
 
balance = withdraw(10);
 
release(lock);
 
return balance;
}
Using locks badly
Why not this?
Will this work correctly?
Rules of Thumb:
Always try to keep critical sections as small as possible
Helps performance, reduces bugs, easier to read, etc…
Keep locking logic as simple as possible
Reduces programmer error, bugs, and cognitive load
int withdraw(int amount){
 
balance -= amount;
 
return balance;
}
int main() {
 
int balance = 0;
 
acquire(lock);
 
balance = withdraw(10);
 
release(lock);
 
return balance;
}
Another Bad Example
int foo() {
 
int err = 0;
 
acquire(lock);
 
 
err = can_fail_1();
 
 
if (err < 0) {
  
release(lock);
  
return err;
 
}
 
err = can_fail_2()
 
if (err < 0) {
  
release(lock);
  
return err;
  
 
}
 
 
release(lock);
 
return 0;
}
Another Bad Example
int foo() {
 
int err = 0;
 
acquire(lock);
 
 
err = can_fail_1();
 
 
if (err < 0) {
  
release(lock);
  
return err;
 
}
 
err = can_fail_2();
 
if (err < 0) {
  
release(lock);
  
return err;
  
 
}
 
 
release(lock);
 
return 0;
}
int foo() {
 
int err = 0;
 
acquire(lock);
 
 
err = can_fail_1();
 
 
if (err < 0) {
  
goto err;
 
}
 
err = can_fail_2();
 
if (err < 0) {
  
goto err;
 
}
err:
 
release(lock);
 
return err;
}
Another Bad Example
int foo() {
 
int err = 0;
 
acquire(lock);
 
 
err = can_fail_1();
 
 
if (err < 0) {
  
release(lock);
  
return err;
 
}
 
err = can_fail_2();
 
if (err < 0) {
  
release(lock);
  
return err;
  
 
}
 
 
release(lock);
 
return 0;
}
static int 
__foo_unlocked()
 {
 
int err = 0;
 
err = can_fail_1();
 
 
if (err < 0) {
  
return err;
 
}
 
err = can_fail_2();
 
if (err < 0) {
  
return err;
  
 
}
 
 
return 0;
}
int foo() {
 
int err = 0;
 
acquire(lock);
 
err = __foo_unlocked();
 
release(lock);
 
return err;
}
Locks without hardware support
Its not pretty…
Peterson’s algorithm
Assume 2 threads
tid -> Thread ID (0,1)
int turn = 0; 
// shared variable
bool lock[2] = {false, false};
int withdraw(int amount) {
 
lock[tid] = true;
 
turn = 1 – tid;
 
while (lock[1 - tid] && (turn == (1 – tid))); 
// spin
 
balance -= amount; 
// critical section
 
lock[tid] = false;
 
return balance;
}
The problem is that this is hard to understand!
Locks with hardware support
CPU provides atomic instructions
Note that x86 operations are 
NOT 
atomic by default, atomicity must
be enabled explicitly
Two standard instructions: 
test-and-set, compare-and-swap
But there are others (See proj. 1)
bool test_and_set(bool * flag) {
 
bool old = *flag;
 
*flag = true;
 
return old;
}
int compare_and_swap(int * value, int cmp, int new) {
 
int old = *value;
 
if (*value == cmp) {
  
*value = new;
 
}
 
return old;
}
Spinlocks
Basic form of mutual exclusion
Busy wait until a lock is available
     
while (!acquire(lock));
Problems:
Wastes resources while trying to acquire lock
Under contention utilization goes way down
Notes:
Only use for very short critical sections
Do not sleep with a lock held!
Linux spinlocks
/* Initialize spinlock before use */
spin_lock_init(spinlock_t * lock);
DEFINE_SPINLOCK(name_of_lock);
/* Basic Locking operations */
spin_lock(spinlock_t * lock);
spin_unlock(spinlock_t * lock);
/* Locking operations with IRQ disabling/enabling *
/
spin_lock_irqsave(spinlock_t * lock, unsigned long flags);
spin_unlock_irqrestore(spinlock_t * lock, unsigned long flags);
static 
DEFINE_SPINLOCK
(xxx_lock);
{
 
unsigned long flags; 
 
spin_lock_irqsave
(&xxx_lock, flags); 
 
... critical section here .. 
 
spin_unlock_irqrestore
(&xxx_lock, flags); 
}
Mutexes
Basic form of mutual exclusion
Sleep until a lock is available
while (!acquire(lock)){sleep();}
Allows lock holder to block (
Why is that?)
Enables higher CPU utilization (
no busy waiting!
)
More overhead on high contention locks
Linux Mutexes
#include <linux/mutex.h>
/
* functions for mutex initialization */
void mutex_init(struct mutex *mutex);
DEFINE_MUTEX(name);
/* functions for mutex acquire */
void mutex_lock(struct mutex *mutex);
/* functions for mutex release */
void mutex_unlock(struct mutex *mutex);
Reader writer locks
Like spin locks for read-mostly data
Reading is only sensitive if it is being actively modified
If modifications are rare then it doesn’t make sense to
serialize readers
Must make sure not to starve writers
acquire(write_lock)
acquire(read_lock)
Linux reader/writer locks
rwlock_t xxx_lock = 
__RW_LOCK_UNLOCKED
(xxx_lock); 
{
 
unsigned long flags; 
 
read_lock_irqsave
(&xxx_lock, flags); 
 
.. critical section that only reads the info ... 
 
read_unlock_irqrestore
(&xxx_lock, flags); 
 
write_lock_irqsave
(&xxx_lock, flags); 
 
.. read and write exclusive access to the info ... 
 
write_unlock_irqrestore
(&xxx_lock, flags);
}
Barriers
Synchronize execution into a sequence of stages
Mostly used in parallel computation (time steps)
Each thread waits until every other thread is at the same
location:
 
barrier_wait(&barrier);
Once every thread arrives, continue;
Ensures that execution occurs in lock step across all
threads
Barrier
Barrier
Barrier
Wait Queues
Locking for coarse grain task coordination is bad
Busy waiting wastes resources
How do we use time waiting to do other stuff
Example: Waiting for I/O events
Asynchronous events can happen any time: 
How do you detect them?
Naïve Solution: 
Polling on locked data structures
Very inefficient (
but fast…
)
Solution: Wait Queues
Sleep until an event arrives
Require signal from other thread to indicate status has changed
Re-evaluate current state before continuing
Ties closely with scheduler
Lots of options…
Linux Waitq’s
Which waitq you use depends on use case
// wait_event - sleep until a condition gets true
wait_event
(wq_head, condition)
// io_wait_event() -- like wait_event() but with io_schedule()
io_wait_event
(wq_head, condition)
// 
wait_event_freezable - sleep (or freeze) until a condition gets true
wait_event_freezable
(wq_head, condition)
// w
ait_event_timeout - sleep until a condition gets true or a timeout elapses
wait_event_timeout
(wq_head, condition, timeout)
wait_event_freezable_timeout
(wq_head, condition, timeout)
wait_event_interruptible
(wq_head, condition) 
wait_event_interruptible_timeout
(wq_head, condition, timeout)
wait_event_hrtimeout
(wq_head, condition, timeout)
wait_event_interruptible_hrtimeout
(wq, condition, timeout)
// And many more…
RCU
Read-copy-update
Replacement for reader-writer locks
Does not actually use locks
Creates different versions of shared data
Modifying data creates a new copy
Future readers access new copy
Old readers still have old version
Must garbage collect
Only after all readers are done
But how do we know?
Quiescent state
Place requirements on reader contexts (cannot sleep)
Therefore if a CPU with a reader has slept, then reader is done
Lots and lots of options
Mostly a side effect of implementation decisions
Linux RCU APIs
RCU list traversal: 
list_entry_rcu 
list_first_entry_rcu 
list_next_rcu 
list_for_each_entry_rcu 
list_for_each_entry_continue_rcu 
list_for_each_entry_from_rcu 
hlist_first_rcu 
hlist_next_rcu 
hlist_pprev_rcu 
hlist_for_each_entry_rcu 
hlist_for_each_entry_rcu_bh 
hlist_for_each_entry_from_rcu 
hlist_for_each_entry_continue_rcu 
hlist_for_each_entry_continue_rcu_bh 
hlist_nulls_first_rcu 
hlist_nulls_for_each_entry_rcu 
hlist_bl_first_rcu 
hlist_bl_for_each_entry_rcu
RCU pointer/list update: 
rcu_assign_pointer 
list_add_rcu 
list_add_tail_rcu 
list_del_rcu 
list_replace_rcu 
hlist_add_behind_rcu 
hlist_add_before_rcu 
hlist_add_head_rcu 
hlist_del_rcu 
hlist_del_init_rcu 
hlist_replace_rcu 
list_splice_init_rcu 
hlist_nulls_del_init_rcu 
hlist_nulls_del_rcu 
hlist_nulls_add_head_rcu 
hlist_bl_add_head_rcu 
hlist_bl_del_init_rcu 
hlist_bl_del_rcu 
hlist_bl_set_first_rcu
Linux RCU APIs
 
RCU:
 
Critical sections
  
Grace period
    
Barrier
  
rcu_read_lock
    
synchronize_net
   
rcu_barrier
  
rcu_read_unlock
   
synchronize_rcu
  
rcu_dereference
   
synchronize_rcu_expedited
  
rcu_read_lock_held
  
call_rcu
  
rcu_dereference_check
 
kfree_rcu
  
rcu_dereference_protected
bh:
  
Critical sections
  
Grace period
    
Barrier
  
rcu_read_lock_bh
   
call_rcu
     
rcu_barrier
  
rcu_read_unlock_bh
  
synchronize_rcu
  
[local_bh_disable]
  
synchronize_rcu_expedited
  
[and friends]
  
rcu_dereference_bh
  
rcu_dereference_bh_check
  
rcu_dereference_bh_protected
  
rcu_read_lock_bh_held
Linux RCU APIs
sched:
 
Critical sections
  
Grace period
   
Barrier
  
rcu_read_lock_sched
  
call_rcu
    
rcu_barrier
  
rcu_read_unlock_sched
 
synchronize_rcu
  
[preempt_disable]
  
synchronize_rcu_expedited
  
[and friends]
  
rcu_read_lock_sched_notrace
  
rcu_read_unlock_sched_notrace
  
rcu_dereference_sched
  
rcu_dereference_sched_check
  
rcu_dereference_sched_protected
  
rcu_read_lock_sched_held
SRCU:
 
Critical sections
  
Grace period
  
Barrier
  
srcu_read_lock
   
call_srcu
   
srcu_barrier
  
srcu_read_unlock
   
synchronize_srcu
  
srcu_dereference
   
synchronize_srcu_expedited
  
srcu_dereference_check
  
srcu_read_lock_held
Linux RCU APIs
 
SRCU:
 
Initialization/cleanup
  
DEFINE_SRCU
  
DEFINE_STATIC_SRCU
  
init_srcu_struct
  
cleanup_srcu_struct
All: 
 
lockdep-checked RCU-protected pointer access
  
rcu_access_pointer
  
rcu_dereference_raw
  
RCU_LOCKDEP_WARN
  
rcu_sleep_check
  
RCU_NONIDLE
Transactional memory
Embed transaction processing into memory
system
Memory system tracks memory touched inside
critical section (transaction)
If another thread modifies same data, transaction
aborts
Execution returns to initial state
Memory contents reset to initial values
Slide Note
Embed
Share

Exploring the concepts of locking and synchronization in the context of shared resources in multithreaded environments. Covering topics such as thread cooperation, coordination of access to shared variables, and the importance of synchronization mechanisms for controlling execution interleaving. Examples include managing a bank account and dealing with concurrent deposits.

  • Multithreading
  • Locking
  • Synchronization
  • Shared Resources
  • Concurrency

Uploaded on Oct 09, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Locking and Synchronization

  2. Resources to checkout Linux Kernel Teaching Basic examples of common APIs https://linux-kernel-labs.github.io/master/index.html Rule of thumb for web based tutorials/wikipedia/Hacker News I can see how this circuit can appear to work, but it clearly wasn't designed by someone that really knew what they were doing. In general, if you see a arduino in a circuit you find on the 'net someplace, especially a simple one, assume it was posted because the author considers it a great accomplishment. Those that know what they are doing and draw out a circuit like this in a minute don't consider it worth writing up a web page on. That leaves those that took two weeks to get the motor to spin without the transistor blowing up and they're not really sure what everything does to write these web pages. - https://electronics.stackexchange.com/questions/95140/purpose-of-the- diode-and-capacitor-in-this-motor-circuit

  3. Shared Memory Thread Synchronization Threads cooperate in multithreaded environments User threads and kernel threads Share resources and data structures E.g. memory cache in a web server Coordinate execution Producer/consumer model For correctness, cooperation must be controlled Must assume threads interleave execution arbitrarily Scheduler cannot know everything Control mechanism: synchronization Allows restriction of interleaving Note: This is a global issue (kernel and user) Also in distributed systems

  4. Shared Resources Our focus: Coordinating access to shared resource Basic Problem: Two concurrent threads access a shared variable Access methods: Read-Modify-Write Two levels of approach: Mechanisms for control Low level locks Higher level mutexes, semaphores, monitors, condition variables Patterns for coordinating access Bounded buffer, producer/consumer,

  5. The Classic Example Managing a bank account: int balance; void deposit(int amount) { balance += amount; } int withdraw(int amount){ balance -= amount; return balance; } Suppose account is shared between 2 people What happens if both people go to ATMS and deposit $$?

  6. Cooperating Requires Synchronization Deposit implemented as sequence of ASM instructions void deposit(int amount) { balance += amount; } Load R1, balance Add R1, amount Store R1, balance What happens if two processes call deposit at same time? Process 1 Process 2 Load R1, balance Add R1, amount Store R1, balance Load R1, balance Add R1, amount Store R1, balance Race Condition: Result depends on operation interleaving

  7. Interleaved Schedules Execution of the two threads can be interleaved Assume uniprocessor w/ preemptive scheduling Load R1, balance Add R1, amount Context Switch Instruction Sequence as seen by CPU Load R1, balance Add R1, amount Store R1, balance Context Switch Store R1, balance What s the account balance after this sequence?

  8. But doesnt the x86 do this for us? X86 allows memory operands CISC (x86) Load/Store Arch (micro ops) HW decoder asm ( addl %1, %0\n , : +m (balance) : r (amount) : ); Load R1, balance Add R1, amount Store R1, balance Answer: NO X86 instructions implemented using micro-ops in HW Instructions decomposed into RISC instructions by CPU decoding logic X86 is load/store under the covers So a single x86 instruction is actually a sequence of micro instructions

  9. The crux of the matter Two concurrent threads access a shared resource without any synchronization Creates a Race condition Output is non-deterministic and depends on timing We need mechanisms for controlling access to shared resources Allows us to reason about program operation Re-introduces determinism Synchronization is necessary for any shared data structure or resource

  10. Mutual Exclusion Mutual exclusion synchronizes access to shared resources Code requiring mutual exclusion for correctness is a critical section Only one thread at a time can execute in critical section All other threads must wait to enter Only when a thread leaves can another can enter

  11. Scheduler assumptions int i = 0; Thread A Thread B while (i < 10) { i++; } printf( A won!\n ); while (i > -10) { i--; } printf( B won!\n ); Who wins? Is it guaranteed that someone wins? Answer: We cannot know There is randomness in the system (From where?) Interrupts and the scheduler

  12. Easy solution for uniprocessors Problem == Preemption Solution == Disable Preemption Sources of preemption Voluntary preemption: Controlled by thread Involuntary preemption: Controlled by Scheduler or Hardware How? Solution: Disable interrupts and do not give up CPU On Linux w/ a single processor all spinlocks are translated to: asm ( cli ); and asm( sti );

  13. Critical Section Requirements Mutual exclusion Only one process in critical section at a time Progress (Deadlock free) A process in a critical section cannot block Cannot depend on other processes Bounded (Starvation free) Waiting processes must eventually proceed Performance Overhead of entering and exiting a critical section is small (relative to work done inside it) Busy waiting can cause serious problems (spin wait) Fair Don t make some processes wait longer than others

  14. Implementing Critical Sections To implement critical sections, need atomic operations Atomic Operations: No other instructions can be interleaved Examples of atomic operations Memory accesses Loads & stores Cache coherency ensures atomicity Code between interrupts on a single CPU/core But interrupts can happen randomly Must explicitly disable interrupts Special instructions Test-and-Set Compare-and-Swap Etc

  15. But the x86 is not atomic! CISC (x86) Load/Store Arch (micro ops) HW decoder asm ( addl %1, %0\n , : +m (balance) : r (amount) : ); Load R1, balance Add R1, amount Store R1, balance The x86 offers a special instruction mode that forces atomicity Only for a single instruction!! Lock Prefix: Forces all micro-ops of a single instruction to execute atomically Asserts a lock signal on the memory bus Disallows other CPUs/cores from accessing memory region asm ( lock <instr> ::: )

  16. Critical Section Building Blocks OK we have single instruction atomicity Now what? Build higher level synchronization in OS Abstraction! Memory accesses Test-and-set Compare-and-swap Disable Interrupts Lock memory bus Locks Semaphores Monitors Condition Variables Small set of operations (that absolutely must be correct!!)

  17. Mechanisms for Building Critical Sections Locks Very primitive with minimal semantics Used heavily in kernel Semaphores Basic Easy to understand Hard to use Monitors High level Modern variants require language support (i.e. Java: synchronized) Linux Kernel: Wait queues

  18. Locks Lock: An object in memory that provides 2 operations Acquire(): Called before critical section entry Release(): Called after critical section exit Always called as a pair (Responsibility of the programmer) Between acquire() and release() a thread holds the lock Acquire does not return until lock is held Only one thread can hold the lock at a time What happens when there is a bad programmer? Two basic types of locks: Spinlocks: acquire() busy waits (spins) until lock is acquired Mutexes: acquire() blocks (sleeps) until lock is acquired

  19. Using Locks int withdraw(int amount){ balance -= amount; return balance; } acquire(lock); balance -= amount; acquire(lock) release(lock); int withdraw(int amount){ acquire(lock); balance -= amount; release(lock); return balance; } balance -= amount; release(lock); What happens at the 2nd acquire()? What would happen if this was a uniprocessor and: Acquire(lock) => asm( cli ); Release(lock) => asm( sti );

  20. Using locks badly int withdraw(int amount){ balance -= amount; return balance; } Why not this? Will this work correctly? int main() { int balance = 0; acquire(lock); balance = withdraw(10); release(lock); return balance; }

  21. Using locks badly int withdraw(int amount){ balance -= amount; return balance; } Why not this? Will this work correctly? int main() { int balance = 0; acquire(lock); balance = withdraw(10); release(lock); return balance; } Rules of Thumb: Always try to keep critical sections as small as possible Helps performance, reduces bugs, easier to read, etc Keep locking logic as simple as possible Reduces programmer error, bugs, and cognitive load

  22. Another Bad Example int foo() { int err = 0; acquire(lock); err = can_fail_1(); if (err < 0) { release(lock); return err; } } err = can_fail_2() if (err < 0) { release(lock); return err; } release(lock); return 0;

  23. Another Bad Example int foo() { int err = 0; int foo() { int err = 0; acquire(lock); acquire(lock); err = can_fail_1(); if (err < 0) { release(lock); return err; } err = can_fail_1(); if (err < 0) { goto err; } err = can_fail_2(); if (err < 0) { goto err; } } err = can_fail_2(); if (err < 0) { release(lock); return err; } err: } release(lock); return err; release(lock); return 0;

  24. Another Bad Example static int __foo_unlocked() { int err = 0; int foo() { int err = 0; err = can_fail_1(); if (err < 0) { return err; } acquire(lock); err = can_fail_1(); if (err < 0) { release(lock); return err; } } err = can_fail_2(); if (err < 0) { return err; } } err = can_fail_2(); if (err < 0) { release(lock); return err; } return 0; int foo() { int err = 0; release(lock); return 0; } acquire(lock); err = __foo_unlocked(); release(lock); return err;

  25. Locks without hardware support Its not pretty Peterson s algorithm Assume 2 threads tid -> Thread ID (0,1) int turn = 0; // shared variable bool lock[2] = {false, false}; int withdraw(int amount) { lock[tid] = true; turn = 1 tid; while (lock[1 - tid] && (turn == (1 tid))); // spin balance -= amount; // critical section } lock[tid] = false; return balance; The problem is that this is hard to understand!

  26. Locks with hardware support CPU provides atomic instructions Note that x86 operations are NOT atomic by default, atomicity must be enabled explicitly Two standard instructions: test-and-set, compare-and-swap But there are others (See proj. 1) bool test_and_set(bool * flag) { bool old = *flag; *flag = true; return old; } int compare_and_swap(int * value, int cmp, int new) { int old = *value; if (*value == cmp) { *value = new; } return old; }

  27. Spinlocks Basic form of mutual exclusion Busy wait until a lock is available while (!acquire(lock)); Problems: Wastes resources while trying to acquire lock Under contention utilization goes way down Notes: Only use for very short critical sections Do not sleep with a lock held!

  28. Linux spinlocks /* Initialize spinlock before use */ spin_lock_init(spinlock_t * lock); DEFINE_SPINLOCK(name_of_lock); /* Basic Locking operations */ spin_lock(spinlock_t * lock); spin_unlock(spinlock_t * lock); /* Locking operations with IRQ disabling/enabling */ spin_lock_irqsave(spinlock_t * lock, unsigned long flags); spin_unlock_irqrestore(spinlock_t * lock, unsigned long flags); static DEFINE_SPINLOCK(xxx_lock); { unsigned long flags; } spin_lock_irqsave(&xxx_lock, flags); ... critical section here .. spin_unlock_irqrestore(&xxx_lock, flags);

  29. Mutexes Basic form of mutual exclusion Sleep until a lock is available while (!acquire(lock)){sleep();} Allows lock holder to block (Why is that?) Enables higher CPU utilization (no busy waiting!) More overhead on high contention locks

  30. Linux Mutexes #include <linux/mutex.h> /* functions for mutex initialization */ void mutex_init(struct mutex *mutex); DEFINE_MUTEX(name); /* functions for mutex acquire */ void mutex_lock(struct mutex *mutex); /* functions for mutex release */ void mutex_unlock(struct mutex *mutex);

  31. Reader writer locks Like spin locks for read-mostly data Reading is only sensitive if it is being actively modified If modifications are rare then it doesn t make sense to serialize readers Must make sure not to starve writers acquire(write_lock) acquire(read_lock)

  32. Linux reader/writer locks rwlock_t xxx_lock = __RW_LOCK_UNLOCKED(xxx_lock); { unsigned long flags; read_lock_irqsave(&xxx_lock, flags); .. critical section that only reads the info ... read_unlock_irqrestore(&xxx_lock, flags); } write_lock_irqsave(&xxx_lock, flags); .. read and write exclusive access to the info ... write_unlock_irqrestore(&xxx_lock, flags);

  33. Barriers Synchronize execution into a sequence of stages Mostly used in parallel computation (time steps) Each thread waits until every other thread is at the same location: barrier_wait(&barrier); Once every thread arrives, continue; Ensures that execution occurs in lock step across all threads Barrier Barrier Barrier

  34. Wait Queues Locking for coarse grain task coordination is bad Busy waiting wastes resources How do we use time waiting to do other stuff Example: Waiting for I/O events Asynchronous events can happen any time: How do you detect them? Na ve Solution: Polling on locked data structures Very inefficient (but fast ) Solution: Wait Queues Sleep until an event arrives Require signal from other thread to indicate status has changed Re-evaluate current state before continuing Ties closely with scheduler Lots of options

  35. Linux Waitqs Which waitq you use depends on use case // wait_event - sleep until a condition gets true wait_event(wq_head, condition) // io_wait_event() -- like wait_event() but with io_schedule() io_wait_event(wq_head, condition) // wait_event_freezable - sleep (or freeze) until a condition gets true wait_event_freezable(wq_head, condition) // wait_event_timeout - sleep until a condition gets true or a timeout elapses wait_event_timeout(wq_head, condition, timeout) wait_event_freezable_timeout(wq_head, condition, timeout) wait_event_interruptible(wq_head, condition) wait_event_interruptible_timeout(wq_head, condition, timeout) wait_event_hrtimeout(wq_head, condition, timeout) wait_event_interruptible_hrtimeout(wq, condition, timeout) // And many more

  36. RCU Read-copy-update Replacement for reader-writer locks Does not actually use locks Creates different versions of shared data Modifying data creates a new copy Future readers access new copy Old readers still have old version Must garbage collect Only after all readers are done But how do we know? Quiescent state Place requirements on reader contexts (cannot sleep) Therefore if a CPU with a reader has slept, then reader is done Lots and lots of options Mostly a side effect of implementation decisions

  37. Linux RCU APIs RCU pointer/list update: rcu_assign_pointer list_add_rcu list_add_tail_rcu list_del_rcu list_replace_rcu hlist_add_behind_rcu hlist_add_before_rcu hlist_add_head_rcu hlist_del_rcu hlist_del_init_rcu hlist_replace_rcu list_splice_init_rcu hlist_nulls_del_init_rcu hlist_nulls_del_rcu hlist_nulls_add_head_rcu hlist_bl_add_head_rcu hlist_bl_del_init_rcu hlist_bl_del_rcu hlist_bl_set_first_rcu RCU list traversal: list_entry_rcu list_first_entry_rcu list_next_rcu list_for_each_entry_rcu list_for_each_entry_continue_rcu list_for_each_entry_from_rcu hlist_first_rcu hlist_next_rcu hlist_pprev_rcu hlist_for_each_entry_rcu hlist_for_each_entry_rcu_bh hlist_for_each_entry_from_rcu hlist_for_each_entry_continue_rcu hlist_for_each_entry_continue_rcu_bh hlist_nulls_first_rcu hlist_nulls_for_each_entry_rcu hlist_bl_first_rcu hlist_bl_for_each_entry_rcu

  38. Linux RCU APIs RCU: Critical sections rcu_read_lock rcu_read_unlock rcu_dereference rcu_read_lock_held rcu_dereference_check kfree_rcu rcu_dereference_protected Grace period synchronize_net synchronize_rcu synchronize_rcu_expedited call_rcu Barrier rcu_barrier bh: Critical sections rcu_read_lock_bh rcu_read_unlock_bh [local_bh_disable] [and friends] rcu_dereference_bh rcu_dereference_bh_check rcu_dereference_bh_protected rcu_read_lock_bh_held Grace period call_rcu synchronize_rcu synchronize_rcu_expedited Barrier rcu_barrier

  39. Linux RCU APIs sched: Critical sections rcu_read_lock_sched rcu_read_unlock_sched synchronize_rcu [preempt_disable] [and friends] rcu_read_lock_sched_notrace rcu_read_unlock_sched_notrace rcu_dereference_sched rcu_dereference_sched_check rcu_dereference_sched_protected rcu_read_lock_sched_held Grace period call_rcu Barrier rcu_barrier synchronize_rcu_expedited SRCU: Critical sections srcu_read_lock srcu_read_unlock srcu_dereference srcu_dereference_check srcu_read_lock_held Grace period call_srcu synchronize_srcu synchronize_srcu_expedited Barrier srcu_barrier

  40. Linux RCU APIs SRCU: Initialization/cleanup DEFINE_SRCU DEFINE_STATIC_SRCU init_srcu_struct cleanup_srcu_struct All: lockdep-checked RCU-protected pointer access rcu_access_pointer rcu_dereference_raw RCU_LOCKDEP_WARN rcu_sleep_check RCU_NONIDLE

  41. Transactional memory Embed transaction processing into memory system Memory system tracks memory touched inside critical section (transaction) If another thread modifies same data, transaction aborts Execution returns to initial state Memory contents reset to initial values

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#