Tech Report CS-90-05

Fault-Tolerant Computing on Trees

Ajit Agrawal

March 1990 (Revised 9/90)

Abstract:

Much attention has been given recently to the resilience of different networks to faults. Hypercubes have been shown to be robust in presence of a constant fraction of random node and edge failures. Constant-degree networks like the meshes have been analyzed with a view to wafer-scale integration in the presence of faults. Hypercube derivative networks have been shown to be robust in the presence of node failures, though the more difficult case of edge failures remains unsolved.

Here we analyze the fault-tolerant computing issues on a complete binary tree. Binary trees are well-known networks, and their structure makes them susceptible to edge faults. Analysis of binary trees is important in its own right as well as in analyzing many other tree derivative networks like the fat-trees, tree of meshes, pyramids, etc.

Given a complete binary tree with faulty components, we show how to simulate the fault-free complete binary tree under different models of faults. If nodes of an N-node complete binary tree fail randomly with constant probability $p < 1/2$, then the faulty tree can simulate the fault-free tree with $O (\log \log N)$ slowdown, with high probability.

If both edges and nodes fail with independent probability $p < 1/2$, we show that $ \Omega ( { ( \frac{N}{\log N} ) } ^ { \log ( \frac{1}{1~-~p} ) } ) $ is a lower bound on the simulation time of the fault-free tree by the faulty tree. We also show an almost strict upper bound on simulation time that is within a $ \log ^ 2 N$ factor of the lower bound for $p < 1/3$. The above results hold with high probability. The technique presented here, of bounding the size of the largest connected component in presence of edge faults, is new and can be extended to other related networks.

(complete text in pdf)