Message Routing in Faulty Hypercube Computers

Hypercube Message Routing in the Presence of Faults

Jesse M. Gordon Quentin F. Stout
University of Michigan

Abstract: We analyze routing messages on hypercube computers which have faulty processors and/or communication links. We consider the basic problem of routing a single message from an arbitrary source to an arbitrary destination. Here ``fault'' can be quite general, including, for example, a link currently unavailable due to congestion.

A framework for the analysis of fault tolerant routing schemes is presented. It includes differing routing schemes, routing information, and fault distribution models. Using random routing, under the one-step local information routing model, we show that the a priori probability of successful message routing is high even for an exceedingly large number of faults. We also analyze the behavior of ``sidetracking'', a routing method which combines the concepts of local information and randomization. The empirical performance of the sidetracking algorithm indicates strongly that, in the limit as the cube dimension grows, for any fixed probability of node failure, the probability of successful routing converges to 1.

Keywords: message routing, fault tolerance, resilience, hypercube computer, parallel computing, randomized routing, random graph

Complete paper. This appears in Proc. 3rd Conf. Hypercube Concurrent Computers and Applications (1988), ACM, pp. 318-327. It was digitized by ACM.

Related Work

J. Gordon, Efficient Schemes for Massively Fault Tolerant Parallel Communication, PhD thesis, U. Michigan, 1990.
An overview of work of myself and students, and relevant papers, in the area of parallel computing.
A modest explanation of parallel computing, a tutorial, Parallel Computing 101, and a list of parallel computing resources.