TY - GEN
T1 - Empirical comparison of heuristic load distribution in point-to-point multicomputer networks
AU - Grunwald, Dirk C.
AU - Nazief, Bobby A.A.
AU - Reed, Daniel A.
PY - 1990/1/1
Y1 - 1990/1/1
N2 - To benefit from parallel computers, programs must be partitioned into units that work in parallel. Once partitioned, these units, called processes, tasks or threads, must be assigned to specific processors for execution. On shared memory architectures, this is termed scheduling, while on multicomputer systems, it is called load distribution, distinguishing it from any local scheduling used on individual nodes. We compared several load placement algorithms using instrumented programs and synthetic program models. Salient characteristics of these program traces (total computation time, total number of messages sent and average message size) span two orders of magnitude. Load distribution algorithms determine the initial placement for processes, a precursor to the more general problem of load redistribution. We simulated a modern architecture with point-to-point communication; however, our findings should apply to wormhole networks as well. We found that information is usually better than inference for driving process placement. The strategies we examine use load or status information to select placement locations; this information is explicitly disseminated and also piggybacked on normal communication. We found that extant point-to-point networks reduce the rate of information dissemination because transiting messages are ignored by intermediate nodes. From these studies, we have concluded that desirable workload distribution strategies will place new processes globally, rather than locally, to spread processes rapidly, but that local information should be used to refine global placement.
AB - To benefit from parallel computers, programs must be partitioned into units that work in parallel. Once partitioned, these units, called processes, tasks or threads, must be assigned to specific processors for execution. On shared memory architectures, this is termed scheduling, while on multicomputer systems, it is called load distribution, distinguishing it from any local scheduling used on individual nodes. We compared several load placement algorithms using instrumented programs and synthetic program models. Salient characteristics of these program traces (total computation time, total number of messages sent and average message size) span two orders of magnitude. Load distribution algorithms determine the initial placement for processes, a precursor to the more general problem of load redistribution. We simulated a modern architecture with point-to-point communication; however, our findings should apply to wormhole networks as well. We found that information is usually better than inference for driving process placement. The strategies we examine use load or status information to select placement locations; this information is explicitly disseminated and also piggybacked on normal communication. We found that extant point-to-point networks reduce the rate of information dissemination because transiting messages are ignored by intermediate nodes. From these studies, we have concluded that desirable workload distribution strategies will place new processes globally, rather than locally, to spread processes rapidly, but that local information should be used to refine global placement.
UR - http://www.scopus.com/inward/record.url?scp=84900339180&partnerID=8YFLogxK
U2 - 10.1109/DMCC.1990.556309
DO - 10.1109/DMCC.1990.556309
M3 - Conference contribution
T3 - Proceedings of the 5th Distributed Memory Computing Conference, DMCC 1990
SP - 984
EP - 993
BT - Architectures, Software Tools and Other General Issues
A2 - Walker, David W.
A2 - Stout, Quentin F.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th Distributed Memory Computing Conference, DMCC 1990
Y2 - 8 April 1990 through 12 April 1990
ER -