EECS 587, Parallel Computing

Professor: Quentin F. Stout


You Can't Hide

It is almost impossible to buy a computer that isn't parallel. Even an Apple watch has 2 cores. Some graphics processing chips (GPUs) have more than 1000 specialized compute cores, and some supercomputers incorporate these chips. The Tianhe-2 supercomputer has > 3,000,000 cores and a theoretical peak performance of > 50 Petaflops, i.e., 50,000,000,000,000,000 floating point operations per second. The number of cores/chip will continually increase, and hence parallel computing is needed to make use of whatever system you buy. This is especially true for compute-intensive tasks such as simulations, analyzing large amounts of data, deep learning, or optimizing complicated systems.

Parallel Computers

Parallel computers are easy to build - it's the software that takes work.


Typically about half the class is from Computer Science and Engineering, and half is from a wide range of other areas throughout the sciences, engineering, and medicine. Some CSE students want to become researchers in parallel computing or cloud computing, while students outside CSE typically intend to apply parallel computing to their discipline, working on problems too large to be solved via serial codes written for single processors. Most students are graduate students, but there are also a few undergraduates and postdocs.

Satisfying Degree Requirements

This course can be used to satisfy requirements in a variety of degree programs.


The course covers a wide range of aspects of parallel computing, with the emphasis on developing efficient software for commonly available systems, such as the clusters here on campus. We will emphasize using standard software, including MPI (Message Passing Interface) for distributed memory systems, OpenMP for shared memory systems, and CUDA for GPUs (graphics processing units). Using standard software helps make the code more portable, preserving the time invested in it. Because there are several parallel computing models, you also have to learn some about various parallel architectures since there is significant hardware/software interaction. This includes aspects such as shared vs. distributed memory, fine-grain vs. medium-grain, MIMD/SIMD, etc.

We examine many reasons for poor parallel performance, and a wide range of techniques for improving it. Concepts covered include domain decomposition for geometric and graph-based problems; deterministic, probabilistic and adaptive load balancing; and sychronization. You'll learn why modest parallelization is relatively easy to achieve, and why efficient massive parallelization is quit difficult - in particular, you will continually encounter Amdahl's Law. Examples and programs will be numeric, such as simple stencil calculations for pdes on a grid, and nonnumeric, such as a very simple discrete event simulation, but they do not assume much knowledge in any field and you'll be given all of the requisite information you need to understand the problem. The focus of the class is on techniques and analyses that cut across applications.

Here is a somewhat whimsical overview of parallel computing.

Work required

The grade is based on computer programming projects, a few written homeworks, and a final project of your choosing (though I must approve it). The final project is about 1/2 of the grade. Often students can integrate this project in with their other work, so, for example, it may be part of their thesis. Many students have used this project to start a new research area. Several of the projects have lead to publications, and a few have resulted in awards (Best Thesis, Gordon Bell, finalist for Best Paper at the Supercomputing conference, etc.). Example projects: If you don't have any ideas for a project I'll help you find some.


Programming ability in C, C++, or Fortran, plus willingness to rethink how problems should be solved. You do not need any prior experience with parallel computing or supercomputing. Since it is ``application neutral'' you don't need a strong background in computer science nor do you need to know numerical analysis. If you don't have much programming experience then you should consider taking EECS 402, Computer Programming for Scientists and Engineers, offered every Winter semester.


None, but we'll use some on-line computer manuals and tutorials, as well as some papers and small sections of texts. (You could always buy my book, but it is not very relevant to the course since it is focused on abstract models of parallelism and algorithms. None of the algorithms in it are appropriate for the computers we'll be using.)


Some parallel computing resources available on the Web.

Homework Assignment


Quentin Copyright © 2000-2019 Quentin F. Stout