|
||||
|
Parallel Static and Dynamic Multi-Constraint Graph Partitioning
Algorithms that find good partitionings of irregular graphs are used in many application areas including scientific computing, data mining, VLSI design, and database storage and retrieval. In particular, in the context of scientific computing they are commonly used to map computational meshes onto the processors of high-performance parallel computers, such that each processor gets a roughly equal number of mesh elements (to ensure load balance) and the amount of interprocessor communication required to exchange information between adjacent mesh elements is minimized. However, the traditional graph partitioning problem formulation is limited in the types of applications that it can effectively model because it specifies that only a single quantity be load-balanced.
Professor George Karypis (Computer Science and Engineering) and Supercomputing Institute Fellow Professor Vipin Kumar (Computer Science and Engineering) and their research group have developed a parallel formulation of a multi-constraint graph partitioning algorithm, as well as a new partitioning algorithm for dynamic multi-phase simulations. Experimental results indicate that these algorithms result in high quality partitionings that satisfy multiple balance contraints. Furthermore, these algorithms are as scalable as the widely-used parallel graph partitioner implemented in ParMeTiS library (www-users.cs.umn.edu/~karypis/metis/).
![]() Figure 1: A computational mesh for a particle-in-cells simulation (a) and a computational mesh for a contact-impact simulation (b). The particle-in-cells mesh is partitioned so that both the number of mesh elements and the number of particles are balanced across the subdomains. Two partitionings are shown for the contact-impact mesh. The dashed partitioning balances only the number of mesh elements. The solid partitioning balances both the number of mesh elements and the number of surface (yellow) elements across the subdomains. |
Multi-phase simulations require that multiple quantities be load-balanced simultaneously. This is because synchronization steps exist between the different phases of the computations. This means that each phase must be individually load-balanced; it is not sufficient to simply sum up the relative times required for each phase and to compute a partitioning based on this sum. Doing so may lead to some processors having too much work during one phase of the computation (and so, these may still be working after other processors are idle), and not enough work during another. Instead, it is critical that every processor have an equal amount of work from each phase of the computation. Two examples are particle-in-cells and contact-impact simulations. The figure on page 1 illustrates the type of partitionings that are needed for these simulations: (a) shows a partitioning of the particle-in-cells mesh that balances both the number of mesh elements and the number of particles across the sub-domains. Meanwhile, (b) shows a mesh for a contact-impact simulation. During the contact detection phase, computation is performed only on the surface (i.e., lightly shaded) elements, while during the impact phase, computation is performed on all of the elements. Therefore, in order to ensure that both phases are load balanced, a partitioning must balance both the total number of mesh elements and the number of surface elements across the sub-domains. The solid partitioning in the figure (b) does this. The dashed partitioning is similar to what a traditional graph partitioner might compute. This partitioning balances only the total number of mesh elements. The surface elements are imbalanced by over 50%.
The new multi-constraint formulation of the graph partitioning problem is able to model the problem of balancing multiple computational phases simultaneously, while also minimizing the inter-processor communications. In this formulation, a weight vector of size m is assigned to each vertex of the graph. The static multi-constraint graph partitioning problem formulation then is to partition the vertices into k disjoint sub-domains such that the total weight of the edges that are cut by the partitioning (i.e., the edge-cut) is minimized, and such that each of their vertex weights are balanced across the sub-domains.
A serial static multi-constraint graph partitioning algorithm has been shown to compute high quality partitionings. This algorithm that is based on the multi-level paradigm (see Figure 3) is fast and effective. However, a parallel formulation of this algorithm is desirable. This is because the computational meshes that are used in parallel multi-phase simulations are often too large to fit in the memory of a single processor. A parallel partitioner can take advantage of the increased memory capacity of parallel machines. Thus, a parallel multi-constraint graph partitioner is key to the efficient execution of very large multi-phase simulations. A static partitioning computed a priori is sufficient for many types of multi-phase simulations.
![]() |
![]() |
Figure 2: The subdomain weights for a 4-way partitioning of a 3-constraint graph. The white bars represent the extra capacity in a subdomain for each weight given a 5% user-specified load imbalance tolerance. |
Figure 3: The three phases of multilevel k-way graph is successively decreased. During the initial partitioning phase, a k-way partitioning is computed, During the uncoarsening/refinement phase, the partitioning is successively refined as it is projected to the larger graphs. G 0 is the input graph, which is the finest graph. G i +1 is the next level coarser graph of G i. G 4 is the coarsest graph. |
Furthermore, simulations whose computational structure evolves dynamically (e.g., due to adaptive mesh methods or the movement of particles between sub-domains) require that dynamic repartitionings be periodically computed to maintain the load balance. These repartitionings should be computed to load balance each phase, to minimize the inter-processor communication, and also to minimize the difference between it and the previous partitioning (i.e., minimize the amount of data redistribution that is required to balance the loads). The dynamic multi-constraint graph partitioning problem formulation then is similar to the static problem, but has an additional objective that is to minimize the amount of data redistribution required to realize the new partitioning. Furthermore, dynamic repartitionings need to be computed in parallel, as it is not scalable to download the mesh onto a single processor each time that load balancing is required.
|
This information is available in alternative formats upon request by
individuals with disabilities. Please send email to
alt-format@msi.umn.edu
or call 612-624-0528.
| |||||||||
| |||||||||