Tech Report CS-03-17

A Comparison of Stream-Oriented High-Availability Algorithms

Jeong-Hyon Hwang, Magdalena Balazinska, Alexander Rasin, Ugur Cetintemel, Michael Stonebraker, and Stan Zdonik

September 2003

Abstract:

Recently, significant efforts have focused on developing novel data-processing systems to support a new class of applications that commonly require sophisticated and timely processing of high-volume data streams. Early work in stream processing has primarily focused on stream-oriented languages and resource-constrained, one-pass query-processing. High availability, an increasingly important goal for virtually all data processing systems, is yet to be addressed.

In this paper, we first describe how the standard high-availability approaches used in data management systems can be applied to distributed stream processing. We then propose a novel stream-oriented approach that exploits the unique data-flow nature of streaming systems. Using analysis and a detailed simulation study, we characterize the performance of each approach and demonstrate that the stream-oriented algorithm significantly reduces runtime overhead at the expense of a small increase in recovery time.

(complete text in pdf or gzipped postscript)