Tech Report CS-91-54

Parallel Algorithms with Processor Failures and Delays

Jonathan F. Buss, Paris C. Kanellakis, Prabhakar L. Radge and Alex A. Shvartsman

August 1991

Abstract:

We study efficient deterministic parallel algorithms on two models: restartable fail-stop CRCW PRAMs and strongly asynchronous PRAMs. In the first model, synchronous processors are subject to arbitrary stop failures and restarts determined by an on-line adversary and involving loss of private but not shared memory; the complexity measures are {\em completed work} (where processors are charged for completed fixed-size update cycles) and {\em overhead ratio} (completed work amortized over necessary work and failures). In the second model, the result of the computation is a serialization of the actions of the processors determined by an on-line adversary; the complexity measure is {\em total work} (number of steps taken by all processors). Despite their differences, the two models share key algorithmic techniques.

We present new algorithms for the {\em Write-All} problem (in which $P$ processors write ones into an array of size $N$) for the two models, and a third which runs on the restartable fail-stop model. These algorithms can be used to implement a simulation strategy for any $N$-processor PRAM on a restartable fail-stop $P$-processor CRCW PRAM such that it guarantees a terminating execution of each simulated $N$-processor step, with $O(\log^2 N)$ overhead ratio and $O(\min \{N +P \log^2 N+ M \log N$, $N \cdot P^{0.59}\})$ (sub-quadratic) completed work (where $M$ is the number of failures during this step's simulation). This strategy is work-optimal when the number of simulating processors is $P \leq N/\log^2 N$ and the total number of failures per each simulated $N$ processor step is $O(N/\log N)$. We also show that the {\em Write-All\/} requires $N-P+ \Omega (P \log P)$ completed/total work on these models for $P\leq N$.

(complete text in pdf)