Abstract: We analyze routing messages on hypercube computers which have faulty processors and/or communication links. We consider the basic problem of routing a single message from an arbitrary source to an arbitrary destination. Here ``fault'' can be quite general, including, for example, a link currently unavailable due to congestion.
A framework for the analysis of fault tolerant routing schemes is presented. It includes differing routing schemes, routing information, and fault distribution models. Using random routing, under the one-step local information routing model, we show that the a priori probability of successful message routing is high even for an exceedingly large number of faults. We also analyze the behavior of ``sidetracking'', a routing method which combines the concepts of local information and randomization. The empirical performance of the sidetracking algorithm indicates strongly that, in the limit as the cube dimension grows, for any fixed probability of node failure, the probability of successful routing converges to 1.
Keywords: message routing, fault tolerance, resilience, hypercube computer, parallel computing, randomized routing, random graph
Complete paper. This appears in Proc. 3rd Conf. Hypercube Concurrent Computers and Applications (1988), ACM, pp. 318-327. It was digitized by ACM.
|Copyright © 2006-2018 Quentin F. Stout|