Tech Report CS-94-19

Critical-Path-Based Message Logging for Incremental Replay of Message-Passing Programs

Robert H. B. Netzer, Sairam Subramanian, and Jian Xu

April 1994

Abstract:

Debugging long-running, nondeterministic message-passing parallel programs requires incremental replay, the ability to exactly replay selected parts of an execution. To support incremental replay we must log enough messages and checkpoint processes often enough to allow any requested replay to complete quickly. We present an adaptive tracing strategy to keep the message-logging overhead down. We let the user specify a bound on the maximum time any replay request is allowed to take. Our algorithm tracks what each processes' critical path will be during a replay and logs enough messages to ensure the critical path will never exceed the bound. Overhead is kept low by not logging messages that can be recomputed during a replay. Experiments indicate that we log about 0.1-5% of the messages while still providing a reasonable bound on any replay.

(complete text in pdf or gzipped postscript)