diff --git a/week12/parallelism.md b/week12/parallelism.md index cf9da23..034269a 100644 --- a/week12/parallelism.md +++ b/week12/parallelism.md @@ -26,7 +26,20 @@ code { --- -# Parallelism vs. Concurrency +# Today: Parallelism + +- Parallelism vs Concurrency + - Workers, Processes, and Threads +- Multithreading +- Mutual Exclusion +- Atomics +- Message Passing + + +--- + + +# Parallelism vs Concurrency ## Concurrency @@ -39,6 +52,7 @@ code { * **Key difference:** Parallelism utilizes **multiple** workers + --- @@ -46,15 +60,11 @@ code { Parallelism divides tasks among workers. -* In hardwareland, we call these workers **processors** and **cores**. - -* In softwareland... - * "Processors" => Processes - * "Cores" => Threads - -* **Key difference:** Parallelism utilizes **multiple** processors/cores - * Some concurrency models don't! - +* In hardware, we call these workers **processors** and **cores** +* In software, workers are abstracted as **processes** and **threads** + * Processes contain many threads +* Parallelism requires **multiple** workers + * Concurrency models can have any number of workers! --- -# Example: The Main Thread + +# The Main Thread The thread that runs by default is the main thread. ```rust -for i in 1..5 { - println!("working on item {i} from the main thread!"); - thread::sleep(Duration::from_millis(1)); +fn main() { + for i in 1..8 { + println!("Main thread says: Hello {i}!"); + thread::sleep(Duration::from_millis(1)); + } } ``` +* So far, we have only been running code on the main thread! + --- -# Example: Creating a Thread +# Spawning a Thread -We can create ("spawn") more threads with `thread::spawn`: +We can create (spawn) more threads using `std::thread::spawn`. ```rust -let handle = thread::spawn(|| { - for i in 1..10 { - println!("working on item {} from the spawned thread!", i); +fn main() { + let child_handle = thread::spawn(|| { + for i in 1..=8 { + println!("Child thread says: Hello {i}!"); + thread::sleep(Duration::from_millis(1)); + } + }); + + for i in 1..=3 { + println!("Main thread says: Hello {i}!"); + thread::sleep(Duration::from_millis(1)); + } +} +``` + + + + +--- + + +# Spawning a Thread + +```rust +let child_handle = thread::spawn(|| { + for i in 1..=8 { + println!("Child thread says: Hello {i}!"); thread::sleep(Duration::from_millis(1)); } }); -for i in 1..5 { - println!("working on item {i} from the main thread!"); +for i in 1..=3 { + println!("Main thread says: Hello {i}!"); thread::sleep(Duration::from_millis(1)); } ``` -* `thread::spawn` takes a closure as argument - * This is the function that the thread runs +* `thread::spawn` takes a `FnOnce` closure + * The closure will be run on the newly-created thread +* Question: What should this program's output be? + +--- + + +# Possible Output 1 + +Here is one possible program output: + +``` +Main thread says: Hello 1! +Child thread says: Hello 1! +Child thread says: Hello 2! +Main thread says: Hello 2! +Child thread says: Hello 3! +Main thread says: Hello 3! +Child thread says: Hello 4! +Child thread says: Hello 5! +``` + +--- + + +# Possible Output 2 + +Here is another possible program output: + +``` +Main thread says: Hello 1! +Child thread says: Hello 1! +Main thread says: Hello 2! +Main thread says: Hello 3 +``` + + + + --- -# Example: Creating a Thread -What is the output? + + +# Possible Output 3 + +And here is yet another possible program output! + ``` -working on item 1 from the main thread! -working on item 1 from the spawned thread! -working on item 2 from the main thread! -working on item 2 from the spawned thread! -working on item 3 from the main thread! -working on item 3 from the spawned thread! -working on item 4 from the main thread! -working on item 4 from the spawned thread! -working on item 5 from the spawned thread! +Main thread says: Hello 1! +Child thread says: Hello 1! +Main thread says: Hello 2! +Child thread says: Hello 2! +Main thread says: Hello 3! +Child thread says: Hello 3! ``` -* Where did the other four spawned threads go? - * The main thread ended before they executed. - * To prevent this, **join** threads to mandate executed. + +* What's going on here? + + +--- + + +# Multithreaded Code is Non-Deterministic + +* Most of the code we have written this semester has been deterministic +* The only counterexamples have been when we've used random number generators or interacted with I/O +* The execution of multi-threaded code can interleave with each other in undeterminable ways! --- -# Example: Joining Threads +# Multithreaded Code is Non-Deterministic -We join threads when we want to wait for a particular thread to finish execution: +In this example, the child thread can interleave its prints with the main thread. ```rust -let handle = thread::spawn(|| { - for i in 1..10 { - println!("working on item {} from the spawned thread!", i); +let child_handle = thread::spawn(|| { + for i in 1..=8 { + println!("Child thread says: Hello {i}!"); thread::sleep(Duration::from_millis(1)); } }); -for i in 1..5 { - println!("working on item {i} from the main thread!"); +for i in 1..=3 { + println!("Main thread says: Hello {i}!"); thread::sleep(Duration::from_millis(1)); } +``` -handle.join().unwrap(); +* Why doesn't the child thread always print "Hello" 8 times? + + +--- + + +# Process Exit `SIGKILL`s Threads + +* When the main thread finishes execution, the process it belongs to exits +* Once the process exits, all threads in the process are `SIGKILL`ed +* If we want to let our child threads finish, we have to wait for them! + + +--- + + +# Joining Threads + +We can `join` a thread when we want to wait for it to finish. + +```rust +// Spawn the child thread. +let child_handle = thread::spawn(|| time_consuming_function()); + +// Run some code on the main thread. +main_thread_foo(); + +// Wait for the child thread to finish running. +child_handle.join().unwrap(); ``` -* Blocks the current thread until the thread associated with `handle` finishes +* `join` will block the calling thread until the child thread finishes + * In this example, the calling thread is the main thread + --- -# Example: Multithreading Output +# Joining Threads + +If we go back to our original example and add a `join` on the child's handle at the end of the program, then the child thread can run to completion! -What is the output now? ``` -working on item 1 from the main thread! -working on item 2 from the main thread! -working on item 1 from the spawned thread! -working on item 3 from the main thread! -working on item 2 from the spawned thread! -working on item 4 from the main thread! -working on item 3 from the spawned thread! -working on item 4 from the spawned thread! -working on item 5 from the spawned thread! -working on item 6 from the spawned thread! -working on item 7 from the spawned thread! -working on item 8 from the spawned thread! -working on item 9 from the spawned thread! +Main thread says: Hello 1! +Child thread says: Hello 1! +Main thread says: Hello 2! +Child thread says: Hello 2! +Main thread says: Hello 3! +Child thread says: Hello 3! +Child thread says: Hello 4! +Child thread says: Hello 5! +Child thread says: Hello 6! +Child thread says: Hello 7! +Child thread says: Hello 8! ``` -* All ten spawned threads are executed! + --- -# Multithreading +# Example: Multithreaded Drawing Suppose we're painting an image to the screen, and we have eight threads. -* How should we divide the work? - * Divide image into eight regions - * Assign each thread to paint one region +How should we divide up the work between the threads? + +* Divide image into eight regions +* Assign each thread to paint one region * Easy! "Embarrassingly parallel" * Threads don't need to keep tabs on each other @@ -238,20 +364,21 @@ Each thread retires to their cave not too different from modern artists --> + --- -# The Case for Communication +# Example: Multithreaded Drawing -What if our image is more complex? +![bg right:45% 100%](../images/week12/circle-order-A-then-B.png) -* We're painting semi-transparent circles -* Circles overlap and are constantly moving -* The _order_ we paint circles affects the color mixing +![bg right:45% 100%](../images/week12/circle-order-B-then-A.png) -![bg right 100%](../images/week12/circle-order-A-then-B.png) +What if our image is more complex? -![bg right 100%](../images/week12/circle-order-B-then-A.png) +* We could be painting semi-transparent circles +* Circles could overlap and/or could be constantly moving +* The _order_ in which we paint circles changes the colors of pixels --- @@ -278,6 +412,7 @@ Now our threads need to talk to each other! **Problem:** How do threads communicate? **Solutions:** We'll discuss two approaches... + * Approach 1: Shared Memory * Approach 2: Message Passing @@ -291,27 +426,30 @@ Now our threads need to talk to each other! # Approach 1: Shared Memory -For each pixel, create a shared variable `x`: +For each pixel, create a shared variable `x` that represents the number of circles that overlap on this pixel: ```c -static int x = 0; // One per pixel +static int x = 0; // One per pixel. ``` +* _Note that this is C pseudocode: we'll explain the Rust way soon_ * When a thread touches a pixel, increment the pixel's associated `x` * Now each thread knows how many layers of paint there are on that pixel + + --- -# Shared Memory: Data Races +# Approach 1: Shared Memory Are we done? -Not quite... - -* Shared memory is an ingredient for **data races** -* Let's illustrate +* Not quite... +* Shared memory is prone to **data races** + --- # Shared Memory: Data Races -**Third ingredient**: when multiple threads update at once... +**Step 3**: Multiple threads update `x` at the same time... ```c -// Invariant: `x` is total number of times **any** thread has called `update_x` +// Invariant: `x` is total number of times **any** thread has called `update_x`. static int x = 0; static void update_x(void) { - int temp = x; // <- x is INCORRECT - temp += 1; // <- x is INCORRECT - x = temp; // <- x is CORRECT + x += 1; } + // + for (int i = 0; i < 20; ++i) { - spawn_thread(update_x); + spawn_thread(update_x); } ``` @@ -381,25 +529,26 @@ for (int i = 0; i < 20; ++i) { # Shared Memory: Data Races -**Third ingredient**: when multiple threads update at once...they interleave! +**Stpe 3**: When multiple threads update `x` at the same time... they interleave! -| Thread 1 | Thread 2 | -|---------------|---------------| -| temp = x | | -| | temp = x | -| temp += 1 | | -| | temp += 1 | -| x = temp | | -| | x = temp | +| Thread 1 | Thread 2 | +|-------------|-------------| +| `temp = x ` | | +| | `temp = x` | +| `temp += 1` | | +| | `temp += 1` | +| `x = temp` | | +| | `x = temp` | + - +Note that this is just one possible way of incorrect interleaving. +--> --- @@ -407,23 +556,23 @@ A: Next slide # Shared Memory: Data Races -We want `x = 2`, but we get `x = 1`! +We want `x = 2`, but we could get `x = 1`! -| Thread 1 | Thread 2 | -|---------------|---------------| -| Read temp = 0 | | -| | Read temp = 0 | -| Set temp = 1 | | -| | Set temp = 1 | -| Set x = 1 | | -| | Set x = 1 | +| Thread 1 | Thread 2 | +|-----------------|-----------------| +| Read `temp = 0` | | +| | Read `temp = 0` | +| Set `temp = 1` | | +| | Set `temp = 1` | +| Set `x = 1` | | +| | Set `x = 1` | --- -# Shared Memory: Data Races +# Shared Memory: Atomicity We want the update to be **atomic**. That is, other threads cannot cut in mid-update. @@ -442,28 +591,28 @@ We want the update to be **atomic**. That is, other threads cannot cut in mid-up **Not Atomic** -| Thread 1 | Thread 2 | -|---------------|---------------| -| temp = x | | -| | temp = x | -| temp += 1 | | -| | temp += 1 | -| x = temp | | -| | x = temp | +| Thread 1 | Thread 2 | +|-------------|-------------| +| `temp = x` | | +| | `temp = x` | +| `temp += 1` | | +| | `temp += 1` | +| `x = temp` | | +| | `x = temp` |
**Atomic** -| Thread 1 | Thread 2 | -|---------------|---------------| -| temp = x | | -| temp += 1 | | -| x = temp | | -| | temp = x | -| | temp += 1 | -| | x = temp | +| Thread 1 | Thread 2 | +|-------------|-------------| +| `temp = x` | | +| `temp += 1` | | +| `x = temp` | | +| | `temp = x` | +| | `temp += 1` | +| | `x = temp` |
@@ -475,21 +624,26 @@ We want the update to be **atomic**. That is, other threads cannot cut in mid-up # Fixing a Data Race We must eliminate one of the following: -1. `x` is shared memory -2. `x` becomes incorrect mid-update +1. `x` is in shared memory +2. `x` temporarily becomes incorrect (mid-update) 3. Unsynchronized updates (parties can "cut in" mid-update) --- -# Approach 1: Mutual Exclusion +# **Mutual Exclusion** + + +--- + +# Approach 1: Mutual Exclusion Take turns! No "cutting in" mid-update. -1. `x` is shared memory -2. `x` becomes incorrect mid-update +1. `x` is in shared memory +2. `x` temporarily becomes incorrect (mid-update) 3. ~~Unsynchronized updates (parties can "cut in" mid-update)~~ @@ -498,9 +652,11 @@ Take turns! No "cutting in" mid-update. # Approach 1: Mutual Exclusion -We need to establish *mutual exclusion*, so that threads don't interfere with each other. -* Mutual exclusion means "Only one thread can do something at a time" -* A common tool for this is a mutex lock +We need to establish _mutual exclusion_. + +* You can think of mutual exclusion as "Only one thread at a time" + * Mutual exclusion means threads don't interfere with each other +* A common tool for this is a `mutex` lock @@ -509,43 +665,45 @@ We need to establish *mutual exclusion*, so that threads don't interfere with ea --- -# Approach 1: Mutual Exclusion +# Mutual Exclusion in C + +Here is how you would use a `mutex` in C: ```c static int x = 0; -static mtx_t x_lock; - -static void thread(void) { - mtx_lock(&x_lock); - int temp = x; - temp += 1; - x = temp; - mtx_unlock(&x_lock); +static mutex_t x_lock; + +static void run_thread(void) { + mtx_lock(&x_lock); + x += 1; + mtx_unlock(&x_lock); } -// ``` -- Only one thread can hold the mutex lock (`mtx_t`) at a time -- This provides *mutual exclusion*--only one thread may access `x` at the same time +* Only one thread can hold the mutex lock (`mutex_t`) at a time +* Other threads block / wait until they get their turn to hold the lock +* Each thread gets "mutual exclusion" over `x` --- -# Approach 1: Mutual Exclusion -In Rust, the `Mutex` lock can be found in the standard library. +# Mutual Exclusion in Rust + +In Rust, the `Mutex` exists in the standard library! + ```rust use std::sync::Mutex; fn main() { - let m = Mutex::new(5); + let m: Mutex = Mutex::new(5); { let mut num = m.lock().unwrap(); - *num = 6; + *num += 1; } - println!("m = {m:?}"); + println!("m = {:?}", m); } ``` @@ -553,14 +711,53 @@ fn main() { --- -# Approach 2: Atomics +# `Mutex` +Rust's `Mutex` is a smart pointer! -One airtight update! Cannot be "incorrect" mid-update. +* `Mutex` owns the data it protects + * In C, you as the programmer have to ensure the mutex is locked correctly + * In Rust, you cannot access the data unless you hold the lock! +* Rust can achieve this using a `MutexGuard<'a, T>` + * You can think of a `Mutex` as the lock provider and the `MutexGuard` as the lock itself -1. `x` is shared memory -2. ~~`x` becomes incorrect mid-update~~ -3. Unsynchronized updates (parties can "cut in" mid-update) + +--- + + +# `MutexGuard<'a, T>` + +The `MutexGuard<'a, T>` created from the `Mutex` is also a smart pointer! + +```rust +let m: Mutex = Mutex::new(5); +{ + // Create a `MutexGuard` that gives the current thead exclusive access. + let guard1 = m.lock().expect("lock was somehow poisoned"); + + // Dereference the guard to get to the data. + let num = *guard1; + println!("{num}"); +} +// At the end of the scope, `guard1` gets dropped and the lock is released. + +// Since the first guard was dropped, we can take the lock again! +let guard2 = m.lock().expect("lock was somehow poisoned"); +``` + +* _Ask us after lecture if you're interested in why there is a [lifetime](https://doc.rust-lang.org/std/sync/struct.MutexGuard.html) there..._ + + + + +--- + + +# **Atomics** --- @@ -568,13 +765,25 @@ One airtight update! Cannot be "incorrect" mid-update. # Approach 2: Atomics -The compiler usually translates the following operation... +One airtight, _atomic_ update! Cannot be "temporarily incorrect" mid-update. + +1. `x` is in shared memory +2. ~~`x` temporarily becomes incorrect (mid-update)~~ +3. Unsynchronized updates (parties can "cut in" mid-update) + + +--- + + +# Code to Machine Instructions + +Recall that the compiler will usually translate the following operation... ```c x += 1; ``` -...into the machine instruction equivalent of this: +...into the machine instruction equivalent of this code: ```c int temp = x; @@ -586,7 +795,7 @@ x = temp; --- -# Approach 2: Atomics +# Atomics However, we can use an atomic operation like this: @@ -603,30 +812,33 @@ x += 1; * `fetch_and_add`: performs the operation suggested by the name, and returns the value that was previously in memory * Also `fetch_and_sub`, `fetch_and_or`, `fetch_and_and`, ... + --- -# Sneak Peak of CAS Atomic +# Aside: `compare_and_swap` -Other common atomic is `compare_and_swap` -* If the current value matches some old value, then write new value into memory - * Depending on variant, returns a boolean for whether new value was written into memory -* "Lock-free" programming: +Another common atomic operation is `compare_and_swap` + +* If the current value matches an old value, update the value + * Returns whether or not the value was updated +* You can do "lock-free" programming with just CAS * No locks! Just `compare_and_swap` until we successfully write new value - * Not necessarily more performant than lock-based solutions - * Contention is bottleneck, not presence of locks - - + * _Not necessarily more performant than lock-based solutions_ + + + --- # Atomics @@ -647,11 +859,57 @@ println!("Atomic Output: {}!", x); # Atomics -Rust provides atomic primitive types, like `AtomicBool`, `AtomicI8`, `AtomicIsize`, etc. -* Provides a way to access values atomically from any thread - * Safe to share between threads implementing `Sync` -* We won't cover it further in this course, but the API is largely 1:1 with the C++20 atomics - * If interested in pitfalls, read up on *memory ordering* in computer systems +These atomic operations are also implemented in the Rust standard library. + +```rust +use std::sync::atomic::{AtomicI32, Ordering}; + +let x = AtomicI32::new(0); + +x.fetch_add(10, Ordering::SeqCst); +x.fetch_sub(2, Ordering::SeqCst); + +println!("Atomic Output: {}!", x); +``` + +* The API is largely identical to C++20 atomics + * If interested in the `Ordering`, research "memory ordering" + + + + +--- + + +# Atomic Action + +Here is an example of incrementing an atomic counter from multiple threads! + +```rust +static counter: AtomicUsize = AtomicUsize::new(0); + +fn main() { + // Spawn 100 threads that each increment the counter by 1. + let handles: Vec> = (0..100) + .map(|_| thread::spawn(|| counter.fetch_add(1, Ordering::Relaxed))) + .collect(); + + // Wait for all threads to finish. + handles.into_iter().for_each(|handle| { handle.join().unwrap(); }); + + assert_eq!(counter.load(Ordering::Relaxed), 100); +} +``` + + +--- + + +# **Message Passing** ---