Tech Report CS-10-01

C-MR: A Continuous MapReduce Processing Model for Low-Latency Stream Processing on Multi-Core Architectures

Nathan Backman, Karthik Pattabiraman, Ugur Cetintemel

February 2010

Abstract:

MapReduce has seen a large degree of success due to its simple yet expressive model that exposes data parallelism opportunities. While it has been widely adopted, MapReduce is not suited for stream-oriented, continuous applications due to a reliance on intermediate disk buffers, the absence of pipelined parallelism, and the inability to focus on minimizing the end-to-end latency of complex workflows of MapReduce jobs.

We address this limitation with the design of Continuous-MapReduce (C-MR) which allows the use of the MapReduce programming model with windowing constructs to support continuous execution on data streams.

While it is possible to execute MapReduce jobs continuously within the standard stream processing model, this solution is inflexible and suffers from inefficient computing resource utilization. We therefore adopt and adapt the MapReduce processing model to support streams to retain its benefits.

C-MR allows us to merge the benefits of both worlds by uniting generic computing nodes and windowing constructs. This allows for pipelined data consumption, incremental processing, and reduction in redundant computations for shared input streams. At the same time, generic computing nodes allow us to perform fine-grained load distribution and improve the scheduling of time-sensitive data to reduce end-to-end latencies. C-MR was developed for use on a multi-core system where we demonstrate its effectiveness at supporting high-performance stream processing.

(complete text in pdf)