Statistical Analysis of Communication Time on the IBM SP2

Theodore B. Tabe     Janis Hardwick     Quentin F. Stout
University of Michigan


Abstract: For parallel computers, the execution time of communication routines is an important determinate of users' performance. We measured the MPL and MPI performance of the IBM SP2, observing that the higher-level collective communication routines such as MPI_ALLTOALL show a drop in performance as the number of processors involved in the communication increases. While a few others have also studied the SP2's communication performance, they have reported only average performance, and failed to comment on the drop in performance or determined its causes.

We generated a distribution of times for these routines and developed a simulator in an attempt to recreate the observed distribution. By studying distributions of communication times and by refining the simulator, we were able to discern that the performance decrease is due to the variation in the communication times of the lower-level send-receive primitives upon which the higher-level communication routines are built. This variation is in turn caused by the deleterious effects of interrupts generated by an operating system (AIX) which is not tuned to high-performance parallel computing. The interupts degrade performance in an additive manner, spreading their effects throughout the system. This behavior is sometimes known as jitter, and its elimination is necessary in order for systems to be able to efficiently use thousands of processors.

Our results were obtained for IBM's MPL message-passing library, which is currently the most highly tuned of the communication libraries available. However, other measurements show that the same results hold for the MPI (Message Passing Interface) library.

Keywords: collective communication, performance evaluation, all-to-all, MPI_ALLTOALL, MPI_SEND, benchmarking, message passing, parallel computer, communication overhead, operating system jitter, interrupts, heavy tail distribution

Complete paper. This paper appears in Computing Science and Statistics 27 (1995), pp. 347-351.


Related work

Quentin's Home Copyright © 2001-2016 Quentin F. Stout