IPP Symposium

Statistical Learning Theory meets Knowledge Discovery: Randomized Algorithms for Big Data Analytics

Matteo Riondato, PhD Student (Algorithms)

Classic techniques for knowledge discovery do not address well the challenges posed by Big Data, in terms of scalability and effectiveness. This is mostly due to their use of tools from classic probability and statistics. In this talk we argue that it is possible to develop much more efficient algorithms for KDD and analytics on large dataset by using tools from statistical learning theory, a recently-developed branch of mathematical statistics. As an example, we will show algorithms for mining frequent itemsets and association rules from large transactional datasets using random sampling through the VC-Dimension, the main tool from statistical learning theory.

Matteo Riondato is a Ph.D. candidate in the Computer Science department at Brown University, advised by Prof. Eli Upfal. His research is focused on the use of modern probability and statistical theory to develop efficient and scalable algorithms for knowledge discovery from databases that are able to address the challenges posed by Big Data.