5 Parallel Algorithms & Parallel Merge Sort

Parallel algorithms use multiple processors to solve a problem faster.
Idea: split the work into independent tasks that can run at the same time.
We can use the fork–join model:
- spawn = create a parallel task
- sync = wait for tasks to finish Key challenge: more processors doesn’t always mean faster — depends on how tasks are divided.

Work ( $T_{1}$ ): total number of operations (like runtime on one core).
Span ( $T_{\infty}$ ): longest chain of dependent tasks — the “critical path.”
Parallelism: $\frac{T _{1}}{T _{\infty}}$ = maximum possible speedup. Goal kinda: keep span small and balance work across processors.

parallel_fib(n):
  if n <= 2: return 1
  else:
    x = spawn parallel_fib(n-1)
    y = parallel_fib(n-2)
    sync
    return x + y

Work: $T_{1} (n) = O (2^{n})$ (huge, exponential).
Span: $T_{\infty} (n) = O (n)$ (linear).
Parallelism: $\frac{2 ^{n}}{n}$ — looks amazing on paper. But in practice: too much redundant work, so it’s inefficient. (it prevents Memoization instead) This example just shows how span vs work can differ dramatically.

Idea:
- Split the array into halves in parallel.
- Recursively sort each half in parallel.
- Merge results in parallel.
Work: $O (n lo g n)$ (same as regular merge sort).
Span: $O (lo g^{2} n)$ (because merging takes $O (lo g n)$ and happens at $lo g n$ levels).
Parallelism: $O! (\frac{n}{lo g n})$ → excellent for large inputs. Unlike Fibonacci, this one actually scales well in practice. (Draw recursion tree splitting arrays, with arrows showing parallel merges.)

Parallel algorithms are about reducing the span so many processors can work together.
Work is like total effort, span is like bottleneck time.
Fibonacci shows theory vs practice; merge sort shows the right balance.
In real systems, overhead from spawning too many tiny tasks must be managed — that’s where scheduling strategies come in.

Quartz 4