Graph partitioning requires the division of a graph's vertex set into k equally sized subsets s. t. some
objective function is optimized. High-quality partitions are important for many applications, whose
objective functions are often NP-hard to optimize. Most state-of-the-art graph partitioning libraries use
a variant of the KernighanLin (KL) heuristic within a multilevel framework. While these libraries are very
fast, their solutions do not always meet all user requirements. Moreover, due to its sequential nature, KL
is not easy to parallelize. Its use as a load balancer in parallel numerical applications therefore requires
complicated adaptations. That is why we developed previously an inherently parallel algorithm, called
Bubble-FOS/C [H. Meyerhenke, B. Monien, S. Schamberger, Accelerating shape optimizing load balancing
for parallel FEM simulations by algebraic multigrid, in: Proceedings of the 20th IEEE International Parallel
and Distributed Processing Symposium, IPDPS'06, IEEE Computer Society, 2006, p. 57 (CD)], which
optimizes partition shapes by a diffusive mechanism. However, it is too slow for practical use, despite
its high solution quality.
In this paper, besides proving that Bubble-FOS/C converges towards a local optimum of a potential
function, we develop a much faster method for the improvement of partitionings. This faster method
called TruncCons is based on a different diffusive process, which is restricted to local areas of the graph
and also contains a high degree of parallelism. By coupling TruncCons with Bubble-FOS/C in a multilevel
framework based on two different hierarchy construction methods, we obtain our new graph partitioning
heuristic DibaP. Compared to Bubble-FOS/C, DibaP shows a considerable acceleration, while retaining the
positive properties of the slower algorithm. Experiments with popular benchmark graphs show that DibaP
computes consistently better results than the state-of-the-art libraries METIS and JOSTLE. Moreover,
with our new algorithm, we have improved the best known edge-cut values for a significant number
of partitionings of six widely used benchmark graphs.