Quentin F. Stout
Bruce Wagar

Computer Science and Engineering, University of Michigan

**Extended Abstract:**
Hypercube algorithms are developed for a variety of communication-intensive
tasks such as transposing a matrix, histogramming,
one node sending a (long) message to another,
broadcasting a message from one node to all others,
each node broadcasting a message to all others,
and nodes exchanging messages via a fixed permutation.
Some of these are similar to the MPI operations
MPI_BCAST, MPI_SEND, MPI_GATHER, MPI_SCATTER, MPI_AllGATHER, MPI_ALLTOALL.
The algorithm for exchanging via a fixed permutation can be viewed as a
deterministic analogue of Valiant's randomized routing.

The algorithms are for hypercubes in which local processing time is
ignored, communication time predominates, message headers are not needed
because all nodes know the task being performed,
and all nodes can use all communication links simultaneously.
Thus the time of an operation is determined by the use of the communication
links, so we call the model *link-bound*.
We assume that the time to send a message of length
*m* is αm + β, where β represents the start-up and
shut-down time of the message, and α represents bandwidth constraints.

Through systematic use of techniques such as pipelining, batching, variable packet sizes, symmetrizing, and completing, for all problems algorithms are obtained which achieve a time with an optimal highest-order term. In several cases we believe that they are the absolutely optimal algorithm.

While the algorithms are optimized for hypercube computers, many of the techniques apply to a wide range of distributed memory computers.

We also show that one must be a careful in determining lower bounds, for we give an algorithm for broadcasting which is faster than a claimed lower bound for this problem. The lower bound erroneously assumed that one can get a lower bound by adding a lower bound that incorporates bandwidth considerations and one that incorporates message startup considerations.

**Keywords:**
parallel computing, collective comunication,
hypercube computer, n-cube, parallel communication, all-to-all
communication, personalized communication, broadcasting, routing, permutations,
matrix transpose, histogramming, distributed memory, message passing

**Complete paper.**
This paper appears in
*Journal of Parallel and Distributed Computing* **10** (1990),
pp. 167-181.
A preliminary version appeared as
``Passing messages in link-bound hypercubes'',
*Hypercube Mutiprocessors 1987*, M. Heath, ed., pp. 251-257.

**Related work:**

Here is an examination of measured all-to-all performance on an IBM SP-2, which has very restricted communication and a poorly tuned operating system.

A modest
explanation of parallel computing, a
tutorial, Parallel Computing 101, and a
list of parallel computing resources.

An
overview of our work, and
relevant papers.

Copyright © 2004-2022 Quentin F. Stout. |