Fault Tolerance of Allocation Schemes in Massively Parallel
Computers
Dept. of Computer Science, Southern Illinois University at Edwardsville
Quentin F. Stout
EECS Department, University of Michigan
Generalizing the buddy and Gray-coded systems, we introduce a new family of allocation schemes which exhibits a significant improvement in fault tolerance over the existing schemes and which uses relatively few additional resources. For purposes of comparison, we study the behavior of the various schemes on the allocation of subsystems of 218 processors in the hypercube, mesh, and torus consisting of 220 processors. Our methods involve a combination of analytical techniques and simulation.
Keywords: fault tolerance, processor allocation, hypercube computer, mesh, torus, buddy system, parallel computing, supercomputing, graph theory, computer science, resource allocation, scheduling
Copyright © 2005-2017 Quentin F. Stout |