IPP Symposium

Big Data on Big Machines: Challenges and Opportunities

Nadathur Satish, Researcher, Intel Labs

Fast data analysis of large data sets is becoming increasingly important in a variety of fields. The data sizes coupled with performance requirements make it necessary to obtain efficient implementations of these analytics operations on large scale clusters. In this talk, we describe the challenges involved in such an effort, ranging from high-level algorithmic innovations to low-level code optimizations. Addressing these challenges requires looking at different aspects of the workload - the use of different (potentially approximate) algorithms, utilizing the parallelism and memory hierarchy of modern processors and system-level network/IO considerations. These optimized workloads can help drive both hardware and programming model innovation. In this talk, we discuss our efforts to develop a suitable benchmark suite for this purpose, with workloads taken from graph analytics, machine learning and data-center computing. We also discuss our efforts to evaluate different programming models in these areas.

Nadathur Satish is a research scientist in Intel's Parallel Computing Lab, part of Microprocessor and Programming Research division of Intel Labs. He received his undergraduate degree in computer science from the Indian Institute of Technology, Kharagpur, and his PhD in electrical engineering and computer sciences from the University of California, Berkeley. His research interests generally relate to parallel applications and architectures. His recent work includes mapping emerging applications on parallel computer architectures, tools and languages to aid parallel workload development, and next generation computer architectures.