Before I explain parallel computing, it's important to understand that
Suppose you have a lot of work to be done, and want to get it done much faster, so you hire 100 workers. If the work is 100 separate jobs that don't depend on each other, and they all take the same amount of time and can be easily parceled out to the workers, then you'll get it done about 100 times faster. This is so easy that it is called embarrassingly parallel. Just because it is embarrassing doesn't mean you shouldn't do it, and in fact it is probably exactly what you should do. A different option would be to parallelize each job, as discussed below, and then run these parallelizations one after another. However, as will be shown, this is probably a less efficient way of doing the work. Occasionally this isn't true because on computers, doing all of the job on one processor may require storing many results on disk, while the parallel job may spread the intermediate results in the RAM of the different processors. RAM is much faster than disks. However, if the program isn't spending a lot of time using the disk then embarrassingly parallel is the smart way to go. Assume this is what you should do unless you analyze the situation and determine that it isn't. Embarrassingly parallel is simple, and if you can get the workers do it for free then it is the cheapest solution as well.
What if the jobs take widely different amounts of time, but still have no interaction? If you have enough of them, just start handing them out until every worker has one, and then when workers are done they come back and request another. This will work fairly well, especially if you can put longer jobs first and shorter ones later. This form of parallelization is still pretty simple (unless it is teenagers that are doing the jobs - see "Do Teenagers Deserve Negative Wages?"). Many of the departmental computers that have more than one processor are run in this fashion, servicing jobs as they arrive. This is also how many airlines run their check-in queue: in queuing theory this is known as a "single queue multiple server" system, which is more efficient than the "multiple queue multiple server" systems common in checkout lines at grocery stores.
Suppose instead that this work you have is only a single job, but it takes a very long time. Now you have to do something to reorganize the job, somehow breaking it into pieces that can be done concurrently. For example, if the job is to build a house, it can be broken up into plumbing, electrical, etc. However, while many jobs can be done at the same time, some have specific orderings, such as putting in the foundation before the walls can go up. If all of the workers are there all of the time, then there will be periods when most of them are just waiting around for some task (such as the foundation) to be finished. Not very cost-effective, and you are not getting the job done 100 times faster. Such is the life of a parallel programmer.
Definition: Parallel computing is the use of two or more processors (cores, computers) in combination to solve a single problem.
The programmer has to figure out how to break the problem into pieces, and has to figure out how the pieces relate to each other. For example, a parallel program to play chess might look at all the possible first moves it could make. Each different first move could be explored by a different processor, to see how the game would continue from that point. At the end, these results have to be combined to figure out which is the best first move. Actually, the situation is even more complicated, because if the program is looking ahead several moves, then different starts can end up at the same board position. To be efficient, the program would have to keep track of this, so that if one processor had already evaluated that position, then others would not waste time duplicating the effort. This is how must parallel chess-playing systems work, including the famous IBM Deep Blue machine that beat Kasparov.
Other parallel computers, such as some of the systems from SGI and Cray, are shared-memory systems, where every processor can directly read and write data from every memory location. Think of this as being like everyone using a blackboard to do their calculations, where everyone has chalk and an eraser and can decide to write anywhere on the board. For these systems, there is a standard known as OpenMP, which is a collection of language extensions to Fortran and C/C++. Shared memory computers must use many non-commodity parts to support the sharing, and hence tend to be more expensive than distributed memory systems. However, it is often easier to write a program for shared memory systems than for distributed memory ones.
Yet other computers exploit graphics processors, GPUs, to achieve parallelism. However, while a GPU may have 100 cores, there are serious restrictions on their programability, and serious performance problems if there is much data movement. Further, the languages to use for GPUs are rapidly evolving, so it is unclear if you should use CUDA, OpenCL, OpenACC, accelerator extensions being incorporated in OpenMP, etc.
What has doubled, however, is the number of processors on a single chip. Some smartphones now have 8 cores (processors), you can buy CPU chips with 12 cores, and this number will soon increase. These are called "multi-core" or "many-core" chips. There are also graphics processing units (GPUs) with over 100 highly specialized processors. This is closer to what Moore was talking about, because what he really said was that the number of transistors would keep doubling. The rate of improvement will slow down, but significant increases in the number of transistors should continue for at least a decade.
If there are only a few cores in your computer then it is pretty easy to find tasks that can be done simultaneously, such as waiting for keystrokes and running a browser, i.e., embarrassingly parallel. However, as the number of processors keeps increasing, the parallelization problems mentioned above become increasingly serious. There are now parallel computers that have > 1,000,000 cores, and people are planning for ones that will have > 100,000,000.
Overall, this means that there is a massive need to make use of the parallelism in multi-core chips for almost any problem, and to use many of these combined together for large problems. Job security for me! (and you, if you learn parallel computing)
There are also resources available via the web - here are some pointers to parallel computing resources such as manuals, software, parallel computers, etc. Here's a video of an overview I presented at Michigan. The quality isn't very good, so I hope it was better in person.
Quentin F. Stout
Computer Science and Engineering
University of Michigan
Ann Arbor, MI 48109-2121
734-763-1518 (office, messages)
qstout @ umich · edu www.eecs.umich.edu/~qstout/
If you are close enough, "Hi" is an effective address.
|Copyright © 2000-2016 Quentin F. Stout|