Tech Report CS-03-03

Load Shedding in a Data Stream Manager

Nesime Tatbul, Ugur Cetintemel, Stan Zdonik, Mitch Cherniack, Michael Stonebraker

March 2003

Abstract:

A Data Stream Manager accepts push-based input from a set of data sources, processes these inputs with respect to a set of standing queries, and produces outputs based on Quality-of-Service (QoS) specifications. When input rates exceed system capacity, the system will become overloaded and latency will deteriorate. Under these conditions, the system will shed load, thus degrading the answer, in order to improve the observed latency of the results.

This paper examines a technique for dynamically inserting and removing drop operators into query plans as required by the current load. We examine two types of drops: the first drops a fraction of the tuples in a randomized fashion, and the second drops tuples based on the importance of their content. We address the problems of determining when load shedding is needed, where in the query plan to insert drops, and how much of the load should be shed at that point in the plan. We describe efficient solutions and present experimental evidence that they can bring the system back into the useful operating range without too much degradation in answer quality.

(complete text in pdf or gzipped postscript)