Tech Report CS-91-35

Parallel VLSI Synthesis

Markus G. Wloka

May 1991

Abstract:

We investigate the parallel complexity of VLSI (very large scale integration) CAD (computer aided design) synthesis problems. Parallel algorithms are very appropriate in VLSI CAD because computationally intensive optimization methods are needed to derive ``good'' chip layouts.

We find that for many problems with polynomial-time serial complexity, it is possible to find an efficient parallel algorithm that runs in polylogarithmic time. We illustrate this by parallelizing the ``left-edge'' channel routing algorithm and the one-dimensional constraint-graph compaction algorithm.

Curiously enough, we find P-hard, or inherently difficult-to-parallelize algorithms when certain key heuristics are used to get approximate solutions to NP-complete problems. In particular, we show that many local search heuristics for problems related to VLSI placement are P-hard or P-complete. These include the Kernighan-Lin heuristic and the simulated annealing heuristic for graph partitioning. We show that local search heuristics for grid embeddings and hypercube embeddings based on vertex swaps are P-hard, as are any local search heuristics that minimize the number of column conflicts in a channel routing by accepting cost-improving swaps of tracks or subtracks. We believe that the P-hardness reductions we provide in this thesis can be extended to include many other important applications of local search heuristics.

Local search heuristics have been established as the method of choice for many optimization problems whenever very good solutions are desired. Local search heuristics are also among the most time-consuming bottlenecks in VLSI CAD systems, and would benefit greatly from parallel speedup. Our P-hardness results make it unlikely, however, that the exact parallel equivalent of a local search heuristic will be found that can find even a local minimum-cost solution in polylogarithmic time. The P-hardness results also put into perspective experimental results reported in the literature: attempts to construct the exact parallel equivalent of serial simulated-annealing-based heuristics for graph embedding have yielded disappointing parallel speedups.

We introduce the massively parallel {\it Mob} heuristic and report on experiments on the CM-2 Connection Machine. The design of the {\it Mob} heuristic was influenced by the P-hardness results. {\it Mob} can execute many moves of a local search heuristic in parallel. We applied our heuristic to the graph-partitioning, grid and hypercube-embedding problems, and report on an extensive series of experiments on the 32K-processor CM-2 Connection Machine that shows impressive reductions in edge costs. To obtain solutions that are within 5% of the best ever found, the {\it Mob} graph-partitioning heuristic needed less than nine minutes of computation time on random graphs of two million edges, and the {\it Mob} grid and hypercube embedding heuristics needed less than 30 minutes on random graphs of one million edges. Due to excessive run times, heuristics reported previously in the literature have been able to construct graph partitions and grid and hypergraph embeddings only for graphs that were 100 to 1000 times smaller than those used in our experiments. On small graphs, where simulated annealing and other heuristics have been extensively tested, our heuristic was able to find solutions of quality at least as good as these heuristics.

(complete text in pdf)