To benefit from parallel computers, programs must be partitioned into units that work in parallel. Once partitioned, these units, called processes, tasks or threads, must be assigned to specific processors for execution. On shared memory architectures, this is termed scheduling, while on multicomputer systems, it is called load distribution, distinguishing it from any local scheduling used on individual nodes. We compared several load placement algorithms using instrumented programs and synthetic program models. Salient characteristics of these program traces (total computation time, total number of messages sent and average message size) span two orders of magnitude. Load distribution algorithms determine the initial placement for processes, a precursor to the more general problem of load redistribution. We simulated a modern architecture with point-to-point communication; however, our findings should apply to wormhole networks as well. We found that information is usually better than inference for driving process placement. The strategies we examine use load or status information to select placement locations; this information is explicitly disseminated and also piggybacked on normal communication. We found that extant point-to-point networks reduce the rate of information dissemination because transiting messages are ignored by intermediate nodes. From these studies, we have concluded that desirable workload distribution strategies will place new processes globally, rather than locally, to spread processes rapidly, but that local information should be used to refine global placement.