Quentin F. Stout
Computer Science and Engineering, University of Michigan
Extended Abstract: Hypercube algorithms are developed for a variety of communication-intensive tasks such as transposing a matrix, histogramming, one node sending a (long) message to another, broadcasting a message from one node to all others, each node broadcasting a message to all others, and nodes exchanging messages via a fixed permutation. Some of these are similar to the MPI operations MPI_BCAST, MPI_SEND, MPI_GATHER, MPI_SCATTER, MPI_AllGATHER, MPI_ALLTOALL. The algorithm for exchanging via a fixed permutation can be viewed as a deterministic analogue of Valiant's randomized routing.
The algorithms are for hypercubes in which local processing time is ignored, communication time predominates, message headers are not needed because all nodes know the task being performed, and all nodes can use all communication links simultaneously. Thus the time of an operation is determined by the use of the communication links, so we call the model link-bound. We assume that the time to send a message of length m is αm + β, where β represents the start-up and shut-down time of the message, and α represents bandwidth constraints.
Through systematic use of techniques such as pipelining, batching, variable packet sizes, symmetrizing, and completing, for all problems algorithms are obtained which achieve a time with an optimal highest-order term. In several cases we believe that they are the absolutely optimal algorithm.
While the algorithms are optimized for hypercube computers, many of the techniques apply to a wide range of distributed memory computers.
We also show that one must be a careful in determining lower bounds, for we give an algorithm for broadcasting which is faster than a claimed lower bound for this problem. The lower bound erroneously assumed that one can get a lower bound by adding a lower bound that incorporates bandwidth considerations and one that incorporates message startup considerations.
Keywords: parallel computing, collective comunication, hypercube computer, n-cube, parallel communication, all-to-all communication, personalized communication, broadcasting, routing, permutations, matrix transpose, histogramming, distributed memory, message passing
Complete paper. This paper appears in Journal of Parallel and Distributed Computing 10 (1990), pp. 167-181. A preliminary version appeared as ``Passing messages in link-bound hypercubes'', Hypercube Mutiprocessors 1987, M. Heath, ed., pp. 251-257.
Here is an examination of measured all-to-all performance on an IBM SP-2, which has very restricted communication and a poorly tuned operating system.
A modest explanation of parallel computing, a tutorial, Parallel Computing 101, and a list of parallel computing resources.
An overview of our work, and relevant papers.
|Copyright © 2004-2022 Quentin F. Stout.|